Problem with entityLinking on Uppercase tokens

Joseph M'Bimbi-Bene Wed, 29 May 2013 10:55:05 -0700

Hello everybody, i am having some problems with the EntityhubLinkingEngine.
I am trying to spot mentions of abbreviations, here is the extract of the
RDF describing the pathological entity:


<rdf:Description rdf:about="http://www.edf.fr/EdfAcronyme.owl#AE";>
    <j.1:name>AE</j.1:name>
    <dc:description>Acoustic Emission; Architect Engineer </dc:description>
    <rdf:type rdf:resource="http://www.edf.fr/EdfAcronyme.owl#Acronyme"/>
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/>
  </rdf:Description>

The language processing is left ot default :
*;lmmtip;uc=LINK;prop=0.75;pprob=0.75

"Proper Noun Linking" is deactivated.

Here is the text where i try to spot my entity:

"attesté depuis 1480, Knecht (valet) indiquant *AE* une servitude vis-à-vis
de l’« employeur » et Land"
Here are some portions of the log of the processing of the tokens:

ProcessingState > 30: Token: [78, 80] *AE *(pos:[Value [pos:
NC(olia:CommonNoun|olia:Noun)].prob=0.2837978274717965]) chunk: 'none'

29.05.2013 19:36:41.243 *DEBUG* [Thread-112] ProcessingState - TokenData:
'AE'[*linkable=false*(linkabkePos=null)| matchable=true(matchablePos=null)|
alpha=true| seachLength=false| *upperCase=true*]

How is it that the token is not linkable, according to the configuration,
"uc=LINK", the token should be considered as linkable. I thought the length
of the token should not come into play.

Here is the remaining of the logs:

*preocess Token 29: uant AE u* (lemma: null) linkable=true, matchable=true
| chunk: none

EntityLinker - 28:'di' (lemma: null) linkable=false, matchable=false

EntityLinker + 30:'AE' (lemma: null) linkable=false, matchable=true

EntityLinker >> searchStrings [uant AE u, AE]

EntityLinker - found 1 entities ...

EntityLinker > http://www.edf.fr/EdfAcronyme.owl#AE (ranking: null)

MainLabelTokenizer > use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.lucene.LuceneLabelTokenizer
for language null

29.05.2013 19:36:41.277 *TRACE* [Thread-112]
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.lucene.LuceneLabelTokenizer
Language null not configured to be supported

MainLabelTokenizer > use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.lucene.LuceneLabelTokenizer
for language null

29.05.2013 19:36:41.277 *TRACE* [Thread-112]
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.lucene.LuceneLabelTokenizer
Language null not configured to be supported

MainLabelTokenizer > use Tokenizer class
org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer
for language null

MainLabelTokenizer - tokenized ae -> [ae]

EntityLinker - no match

*EntityLinker --- preocess Token 33: ser* (lemma: null) linkable=true,
matchable=true | chunk: none

EntityLinker - 32:'e' (lemma: null) linkable=false, matchable=false

EntityLinker + 34:'servitude' (lemma: null) linkable=true, matchable=true

EntityLinker >> searchStrings [ ser, servitude]

EntityLinker - found 0 entities ...


As we can see, "AE" is never processed. What am i doign wrong ? Thank you
in advance

Problem with entityLinking on Uppercase tokens

Reply via email to