i thought it might be a bug in the absence of POS tagging, etc. so i used Talismane for NLP tasks, i configured the EnitytihubLinkingEngine to link adjectives since it is what Talismane tags "mario" as, but it doesn't change anything. here are the logs*
.EntityLinker --- preocess Token 117: *moustachu *(lemma: none | pos:[Value [pos: ADJ(olia:Adjective)].prob=0.4520518431389538]) chunk: none .EntityLinker - 116:'*plombier'* (lemma: none | pos:[Value [pos: NC(olia:CommonNoun|olia:Noun)].prob=0.6784572817881412]).EntityLinker + 118:'supérieure' (lemma: none | pos:[Value [pos: ADJ(olia:Adjective)].prob=0.9366843193563169]).EntityLinker >> searchStrings *[moustachu, supérieure]*.EntityLinker - found 1 entities ....EntityLinker > http://example.org/resource/Mario (ranking: null).MainLabelTokenizer > use Tokenizer class org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer for language null .MainLabelTokenizer - tokenized le plombier moustachu -> *[le, plombier, moustachu]* .MainLabelTokenizer > use Tokenizer class org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer for language null. MainLabelTokenizer - tokenized Mario -> [Mario].EntityLinker - *no match* why isn't "plombier" in "searchstrings" ? even if i configured the engine so that adjective are linkable tokens, according to the documentation, "plombier" should be a "matchable token". The behavior of this engine is quite disturbing ... 2013/5/6 Joseph M'Bimbi-Bene <[email protected]> > Hello everybody, i'm having some problems with the EntityhubLinkingEngine. > Before about 2 weeks ago, i used it for NER tasks on a custom vocabulary > and it worked fine. now I cannot spot entities with label on several words > (even with the parameter lmmtip in "languages configuration" and it now > seems to be case sensitive, even if configured not to be. > > Here is what my entity looks like > > <rdf:Description rdf:about="http://example.org/resource#Mario"> > <skos:prefLabel>Mario</skos: prefLabel> > <skos:altLabel>le plombier moustachu</skos:altLabel> > <rdf:type>http://example.org/concept#gentil</rdf:type> > <rdf:type>http://example.org/concept#humain</rdf:type> > </rdf:Description> > > And i want to spot it with the mention "plombier moustachu". > here is a log illustrating what i used to have : > > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker --- > preocess Token 825: plombier (lemma: none | pos:[]) chunk: none > > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker - > 824:'le' (lemma: none | pos:[]) > > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker + > 826:'moustachu' (lemma: none | pos:[]) > > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker >> > searchStrings > [plombier, moustachu] > > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker - > found 1 entities ... > > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker > > http://example.org/resource#Mario > > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker < > le plombier moustachu[m=FULL,s=3,c=3(1.0)/3] score=1.0[l=1.0,t=1.0] for > http://example.org/resource#Mario > > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker >> > Suggestions: > 18.04.2013 14:37:15.794 *DEBUG* [Thread-303] > org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker - 0: > le plombier moustachu[m=FULL,s=3,c=3(1.0)/3] score=1.0[l=1.0,t=1.0] for > http://example.org/resource#Mario > > and here is what i now have: > here with the processing of the token "plombier" > > EntityLinker --- *preocess Token 17: plombier* (lemma: none | pos:[]) > chunk: none > EntityLinker - 16:'le' (lemma: none | pos:[]) > EntityLinker - 18*:'moustachu'* (lemma: none | pos:[]) > EntityLinker - 15:'sont' (lemma: none | pos:[]) > EntityLinker - 19:'des' (lemma: none | pos:[]) > EntityLinker - 14:',' (lemma: none | pos:[]) > EntityLinker - 20:'collines' (lemma: none | pos:[]) > EntityLinker >> *searchStrings [plombier]* > .MainLabelTokenizer > use Tokenizer class > org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer > for language null > MainLabelTokenizer - tokenized le plombier moustachu ->* **[le, > plombier, moustachu]* > MainLabelTokenizer > use Tokenizer class > org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer > for language null > MainLabelTokenizer - tokenized Mario -> [Mario] > EntityLinker -* **no match * > > why isn't "plombier" or moustachu" in the searchstring, just as before ? > and now with the processing of "mario" > > .EntityLinker --- preocess Token 16: *mario* (lemma: none | pos:[]) > chunk: none > .EntityLinker - 15:'sont' (lemma: none | pos:[]) > .EntityLinker - 17:'des' (lemma: none | pos:[]) > .EntityLinker - 14:',' (lemma: none | pos:[]) > .EntityLinker - 18:'collines' (lemma: none | pos:[]) > .EntityLinker - 13:'mendips' (lemma: none | pos:[]) > .EntityLinker - 19:'situées' (lemma: none | pos:[]) > .EntityLinker >> searchStrings *[mario]* > .EntityLinker - found 1 entities ... > .EntityLinker > http://example.org/resource/Mario (ranking: null) > .MainLabelTokenizer > use Tokenizer class > org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer > for language null > .MainLabelTokenizer - tokenized le plombier moustachu -> [le, plombier, > moustachu] > .MainLabelTokenizer > use Tokenizer class > org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer > for language null > .MainLabelTokenizer - tokenized Mario -> *[Mario]* > .EntityLinker - *no match* > > why isn't "mario" matched against "Mario", i configured the engine so > thtat it is not case sensitive > > as you can see, in the MaxTokenSearchDistance, i still have "le" and > "moustachu" tokens but it doesn't go in the SearchString for lookup. In the > result of the enhancement is now pretty bad. What is going on ? > > Thank you a lot in advance >
