I also have a problem with french opennlp. As you told me in a previous mail, i tried to manually specify to link specific POS returned by the POS tagger. Here is the config of the engine:
*;lmmtip;uc=LINK;lc=Noun;prob=0.55;pprob=0.01 fr;pos=NC but it looks like it is not a correct parameter. - *EDFAcronymeLinking* ( required , currently not available) And in the trace, i get the following message: "07.06.2013 19:39:13.727 *ERROR* [CM Event Dispatcher (Fire ConfigurationEvent: pid=org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.d05f6207-7998-4b28-85e7-f0a24b7ca34b)] org.apache.stanbol.enhancer.engines.entityhublinking [org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine] The activate method has thrown an exception (org.osgi.service.cm.ConfigurationException: enhancer.engines.linking.processedLanguages : 'NC' of param 'pos' for language 'fr'is not a member of the enum Pos(configured : 'NC')!) org.osgi.service.cm.ConfigurationException: enhancer.engines.linking.processedLanguages : 'NC' of param 'pos' for language 'fr'is not a member of the enum Pos(configured : 'NC')! at org.apache.stanbol.enhancer.engines.entitylinking.config.TextProcessingConfig.parseEnumParam(TextProcessingConfig.java:500)" So how can i specify to link specific POS tags ? Should i add POS tags in PosTagSetRegistry ? Isn't there an easier way to do it ? Then, just for the sake of trying, i tried fr;tag=NP Then the engine is available, but obviously it hase nothing to do with pos tagging, and the result is here: 07.06.2013 19:41:35.853 *DEBUG* [Thread-2002] org.apache.stanbol.enhancer.engines.entitylinking.impl.ProcessingState > 205: Token: [1091, 1096] objet (pos:[Value [pos: NC([])].prob=0.981918523624271]) chunk: 'none' 07.06.2013 19:41:35.853 *DEBUG* [Thread-2002] org.apache.stanbol.enhancer.engines.entitylinking.impl.ProcessingState - TokenData: 'objet'[linkable=false(*linkabkePos=false*)| matchable=false(matchablePos=null)| alpha=true| seachLength=true| upperCase=false] 2013/6/7 Joseph M'Bimbi-Bene <jbi...@object-ive.com> > Aaaaaah, thank you a lot, i should have figured out the misspelling myself > ! > > Until i can find another pathological text, i isolated one and sent you in > private if you have the opportunity to work on it. My colleagues told me it > is not supposed to be published. > > And rolling back to the "__3. Unknown POS tag Rules__" of the issue > n°1049, i think the behaviour should be left to the user and parametrisable. > As far as i'm concernd, when POS tagging is available, i would like to > simply ignore tokens without POS tags or below a specified threshold. > > In an ideal case, i would like to be able to edit/set rules for linkable / > matchable tokens from the web console. > > > > > 2013/6/7 Rupert Westenthaler <rupert.westentha...@gmail.com> > >> Hi >> >> On Thu, Jun 6, 2013 at 5:26 PM, Joseph M'Bimbi-Bene >> <jbi...@object-ive.com> wrote: >> > Hello, sorry for the late answer. Thank you for yours >> > >> > >> > 2013/6/3 Rupert Westenthaler <rupert.westentha...@gmail.com> >> > >> >> Hi Joseph >> >> >> >> On Mon, Jun 3, 2013 at 3:43 PM, Joseph M'Bimbi-Bene >> >> <jbi...@object-ive.com> wrote: >> >> [..] >> >> > >> >> > Now, the logs of the processing of the token "La" >> >> > >> >> > ProcessingState > 0: Token: [1087, 1089] La (pos:[Value [pos: >> >> > ADJ(olia:Adjective)].prob=0.016871281997002517]) chunk: 'none' >> >> > >> >> > ProcessingState - TokenData: 'La'[linkable=true(linkabkePos=null)| >> >> > matchable=true(matchablePos=null)| alpha=true| seachLength=true| >> >> > upperCase=true] >> >> > >> >> >> >> The reason why the 'La' of the last sentence of your document is >> >> marked as 'linkable' is the combination of the following things: >> >> >> >> 1. the POS tag has a very low probability (0.017) and is therefore >> >> ignored as the configured minimum probability is higher as that. >> >> >> > >> > Actually, i set both parameters "prop" and "pprob" to 0.01 , i didn't >> > commit any mistake, did i ? You mentionned or a previous mail something >> > about a strange tokenizing behaviour, it might be a source of a new >> > problem: here is, for example a log excerpt from the stanbol web console >> > for an integration test. I isolated the pathologic case : >> > >> >> The reason is that "prop=0.01" should be "prob=0.01". There is a typo >> in the default configuration, because of that the changed value for >> "prop" does not have any effect. I created STANBOL-1100 for fixing >> this. >> >> > and when i curl the text to Talismane, i get the following message: >> > >> > 16:49:21,166 [main] INFO server.Main - ... starting server >> > 16:53:55,560 [btpool0-2] ERROR resource.AnalysisResource - Exception >> while >> > analysing Blob >> > java.lang. IllegalArgumentException: Illegal span [2199,2201] for Token >> > relative to Text: [0, 2200] : Span of the contained Token MUST NOT >> extend >> > the others! >> >> When implementing the Talismane Stanbol integration I had a lot of >> problems with the getting the index positions right. Getting a Span of >> a Token exceeding the size of the document could indicate that there >> are still some problems with that. >> >> If you come across a text that can reproduce this please open an issue >> on the stanbol-talismane [1] >> >> [1] https://github.com/westei/stanbol-talismane >> >> OK, i will try it later. But from a glance at the doc, shouldn't i have a pull access ? I guess not anyone has it > best >> Rupert >> >> >> -- >> | Rupert Westenthaler rupert.westentha...@gmail.com >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > >