Re: problem with entityhub

Joseph M'Bimbi-Bene Mon, 06 May 2013 01:49:12 -0700

i thought it might be a bug in the absence of POS tagging, etc. so i used
Talismane for NLP tasks, i configured the EnitytihubLinkingEngine to link
adjectives since it is what Talismane tags "mario" as, but it doesn't
change anything. here are the logs*


 .EntityLinker --- preocess Token 117: *moustachu *(lemma: none |
pos:[Value [pos: ADJ(olia:Adjective)].prob=0.4520518431389538]) chunk: none
.EntityLinker - 116:'*plombier'* (lemma: none | pos:[Value [pos:
NC(olia:CommonNoun|olia:Noun)].prob=0.6784572817881412]).EntityLinker +
118:'supérieure' (lemma: none | pos:[Value [pos:
ADJ(olia:Adjective)].prob=0.9366843193563169]).EntityLinker >>
searchStrings *[moustachu, supérieure]*.EntityLinker - found 1
entities ....EntityLinker
> http://example.org/resource/Mario (ranking: null).MainLabelTokenizer >
use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
for language null
.MainLabelTokenizer - tokenized le plombier moustachu -> *[le, plombier,
moustachu]*
.MainLabelTokenizer > use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
for language null.
MainLabelTokenizer - tokenized Mario -> [Mario].EntityLinker - *no match*

why isn't "plombier" in "searchstrings" ? even if i configured the engine
so that adjective are linkable tokens, according to the documentation,
"plombier" should be a "matchable token". The behavior of this engine is
quite disturbing ...


2013/5/6 Joseph M'Bimbi-Bene <[email protected]>

> Hello everybody, i'm having some problems with the EntityhubLinkingEngine.
> Before about 2 weeks ago, i used it for NER tasks on a custom vocabulary
> and it worked fine. now I cannot spot entities with label on several words
> (even with the parameter lmmtip in "languages configuration" and it now
> seems to be case sensitive, even if configured not to be.
>
> Here is what my entity looks like
>
> <rdf:Description rdf:about="http://example.org/resource#Mario";>
>         <skos:prefLabel>Mario</skos: prefLabel>
>         <skos:altLabel>le plombier moustachu</skos:altLabel>
>         <rdf:type>http://example.org/concept#gentil</rdf:type>
>         <rdf:type>http://example.org/concept#humain</rdf:type>
> </rdf:Description>
>
> And i want to spot it with the mention "plombier moustachu".
> here is a log illustrating what i used to have :
>
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker ---
> preocess Token 825: plombier (lemma: none | pos:[]) chunk: none
>
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     -
> 824:'le' (lemma: none | pos:[])
>
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     +
> 826:'moustachu' (lemma: none | pos:[])
>
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker   >> 
> searchStrings
> [plombier, moustachu]
>
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker    -
> found 1 entities ...
>
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker     >
> http://example.org/resource#Mario
>
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker       <
> le plombier moustachu[m=FULL,s=3,c=3(1.0)/3] score=1.0[l=1.0,t=1.0] for
> http://example.org/resource#Mario
>
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker   >>
> Suggestions:
> 18.04.2013 14:37:15.794 *DEBUG* [Thread-303]
> org.apache.stanbol.enhancer.engines.entitylinking.impl.EntityLinker    - 0:
> le plombier moustachu[m=FULL,s=3,c=3(1.0)/3] score=1.0[l=1.0,t=1.0] for
> http://example.org/resource#Mario
>
> and here is what i now have:
> here with the processing of the token "plombier"
>
> EntityLinker --- *preocess Token 17: plombier* (lemma: none | pos:[])
> chunk: none
> EntityLinker     - 16:'le' (lemma: none | pos:[])
> EntityLinker     - 18*:'moustachu'* (lemma: none | pos:[])
> EntityLinker     - 15:'sont' (lemma: none | pos:[])
> EntityLinker     - 19:'des' (lemma: none | pos:[])
> EntityLinker     - 14:',' (lemma: none | pos:[])
> EntityLinker     - 20:'collines' (lemma: none | pos:[])
> EntityLinker   >> *searchStrings [plombier]*
> .MainLabelTokenizer  > use Tokenizer class
> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
> for language null
> MainLabelTokenizer    - tokenized le plombier moustachu ->* **[le,
> plombier, moustachu]*
> MainLabelTokenizer  > use Tokenizer class
> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
> for language null
> MainLabelTokenizer    - tokenized Mario -> [Mario]
> EntityLinker       -* **no match *
>
> why isn't "plombier" or moustachu" in the searchstring, just as before ?
> and now with the processing of "mario"
>
>  .EntityLinker --- preocess Token 16: *mario* (lemma: none | pos:[])
> chunk: none
> .EntityLinker - 15:'sont' (lemma: none | pos:[])
> .EntityLinker - 17:'des' (lemma: none | pos:[])
> .EntityLinker - 14:',' (lemma: none | pos:[])
> .EntityLinker - 18:'collines' (lemma: none | pos:[])
> .EntityLinker - 13:'mendips' (lemma: none | pos:[])
> .EntityLinker - 19:'situées' (lemma: none | pos:[])
> .EntityLinker >> searchStrings *[mario]*
> .EntityLinker - found 1 entities ...
> .EntityLinker > http://example.org/resource/Mario (ranking: null)
> .MainLabelTokenizer > use Tokenizer class
> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
> for language null
> .MainLabelTokenizer - tokenized le plombier moustachu -> [le, plombier,
> moustachu]
> .MainLabelTokenizer > use Tokenizer class
> org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
> for language null
> .MainLabelTokenizer - tokenized Mario -> *[Mario]*
> .EntityLinker - *no match*
>
>  why isn't "mario" matched against "Mario", i configured the engine so
> thtat it is not case sensitive
>
> as you can see, in the MaxTokenSearchDistance, i still have "le" and
> "moustachu" tokens but it doesn't go in the SearchString for lookup. In the
> result of the enhancement is now pretty bad. What is going on ?
>
> Thank you a lot in advance
>

Re: problem with entityhub

Reply via email to