Hello everybody, i am having some problems with the EntityhubLinkingEngine. I am trying to spot mentions of abbreviations, here is the extract of the RDF describing the pathological entity:
<rdf:Description rdf:about="http://www.edf.fr/EdfAcronyme.owl#AE"> <j.1:name>AE</j.1:name> <dc:description>Acoustic Emission; Architect Engineer </dc:description> <rdf:type rdf:resource="http://www.edf.fr/EdfAcronyme.owl#Acronyme"/> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#NamedIndividual"/> </rdf:Description> The language processing is left ot default : *;lmmtip;uc=LINK;prop=0.75;pprob=0.75 "Proper Noun Linking" is deactivated. Here is the text where i try to spot my entity: "attesté depuis 1480, Knecht (valet) indiquant *AE* une servitude vis-à-vis de l’« employeur » et Land" Here are some portions of the log of the processing of the tokens: ProcessingState > 30: Token: [78, 80] *AE *(pos:[Value [pos: NC(olia:CommonNoun|olia:Noun)].prob=0.2837978274717965]) chunk: 'none' 29.05.2013 19:36:41.243 *DEBUG* [Thread-112] ProcessingState - TokenData: 'AE'[*linkable=false*(linkabkePos=null)| matchable=true(matchablePos=null)| alpha=true| seachLength=false| *upperCase=true*] How is it that the token is not linkable, according to the configuration, "uc=LINK", the token should be considered as linkable. I thought the length of the token should not come into play. Here is the remaining of the logs: *preocess Token 29: uant AE u* (lemma: null) linkable=true, matchable=true | chunk: none EntityLinker - 28:'di' (lemma: null) linkable=false, matchable=false EntityLinker + 30:'AE' (lemma: null) linkable=false, matchable=true EntityLinker >> searchStrings [uant AE u, AE] EntityLinker - found 1 entities ... EntityLinker > http://www.edf.fr/EdfAcronyme.owl#AE (ranking: null) MainLabelTokenizer > use Tokenizer class org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.lucene.LuceneLabelTokenizer for language null 29.05.2013 19:36:41.277 *TRACE* [Thread-112] org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.lucene.LuceneLabelTokenizer Language null not configured to be supported MainLabelTokenizer > use Tokenizer class org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.lucene.LuceneLabelTokenizer for language null 29.05.2013 19:36:41.277 *TRACE* [Thread-112] org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.lucene.LuceneLabelTokenizer Language null not configured to be supported MainLabelTokenizer > use Tokenizer class org.apache.stanbol.enhancer.engines.entitylinking.labeltokenizer.opennlp.OpenNlpLabelTokenizer for language null MainLabelTokenizer - tokenized ae -> [ae] EntityLinker - no match *EntityLinker --- preocess Token 33: ser* (lemma: null) linkable=true, matchable=true | chunk: none EntityLinker - 32:'e' (lemma: null) linkable=false, matchable=false EntityLinker + 34:'servitude' (lemma: null) linkable=true, matchable=true EntityLinker >> searchStrings [ ser, servitude] EntityLinker - found 0 entities ... As we can see, "AE" is never processed. What am i doign wrong ? Thank you in advance