Re: Problem with entityLinking on Uppercase tokens

Joseph M'Bimbi-Bene Fri, 07 Jun 2013 10:43:46 -0700

I also have a problem with french opennlp. As you told me in a previous
mail, i tried to manually specify to link specific POS returned by the POS
tagger.
Here is the config of the engine:


*;lmmtip;uc=LINK;lc=Noun;prob=0.55;pprob=0.01
fr;pos=NC

but it looks like it is not a correct parameter.

   -  *EDFAcronymeLinking* ( required , currently not available)

And in the trace, i get the following message:
"07.06.2013 19:39:13.727 *ERROR* [CM Event Dispatcher (Fire
ConfigurationEvent:
pid=org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine.d05f6207-7998-4b28-85e7-f0a24b7ca34b)]
org.apache.stanbol.enhancer.engines.entityhublinking
[org.apache.stanbol.enhancer.engines.entityhublinking.EntityhubLinkingEngine]
The activate method has thrown an exception
(org.osgi.service.cm.ConfigurationException:
enhancer.engines.linking.processedLanguages : 'NC' of param 'pos' for
language 'fr'is not a member of the enum Pos(configured : 'NC')!)
org.osgi.service.cm.ConfigurationException:
enhancer.engines.linking.processedLanguages : 'NC' of param 'pos' for
language 'fr'is not a member of the enum Pos(configured : 'NC')!
    at
org.apache.stanbol.enhancer.engines.entitylinking.config.TextProcessingConfig.parseEnumParam(TextProcessingConfig.java:500)"

So how can i specify to link specific POS tags ? Should i add POS tags in
PosTagSetRegistry ? Isn't there an easier way to do it ?


Then, just for the sake of trying, i tried
fr;tag=NP
Then the engine is available, but obviously it hase nothing to do with pos
tagging, and the result is here:

07.06.2013 19:41:35.853 *DEBUG* [Thread-2002]
org.apache.stanbol.enhancer.engines.entitylinking.impl.ProcessingState   >
205: Token: [1091, 1096] objet (pos:[Value [pos:
NC([])].prob=0.981918523624271]) chunk: 'none'
07.06.2013 19:41:35.853 *DEBUG* [Thread-2002]
org.apache.stanbol.enhancer.engines.entitylinking.impl.ProcessingState
- TokenData: 'objet'[linkable=false(*linkabkePos=false*)|
matchable=false(matchablePos=null)| alpha=true| seachLength=true|
upperCase=false]





2013/6/7 Joseph M'Bimbi-Bene <[email protected]>

> Aaaaaah, thank you a lot, i should have figured out the misspelling myself
> !
>
> Until i can find another pathological text, i isolated one and sent you in
> private if you have the opportunity to work on it. My colleagues told me it
> is not supposed to be published.
>
> And rolling back to the "__3. Unknown POS tag Rules__" of the issue
> n°1049, i think the behaviour should be left to the user and parametrisable.
> As far as i'm concernd, when POS tagging is available, i would like to
> simply ignore tokens without POS tags or below a specified threshold.
>
> In an ideal case, i would like to be able to edit/set rules for linkable /
> matchable tokens from the web console.
>
>
>
>
> 2013/6/7 Rupert Westenthaler <[email protected]>
>
>> Hi
>>
>> On Thu, Jun 6, 2013 at 5:26 PM, Joseph M'Bimbi-Bene
>> <[email protected]> wrote:
>> > Hello, sorry for the late answer. Thank you for yours
>> >
>> >
>> > 2013/6/3 Rupert Westenthaler <[email protected]>
>> >
>> >> Hi Joseph
>> >>
>> >> On Mon, Jun 3, 2013 at 3:43 PM, Joseph M'Bimbi-Bene
>> >> <[email protected]> wrote:
>> >> [..]
>> >> >
>> >> > Now, the logs of the processing of the token "La"
>> >> >
>> >> > ProcessingState > 0: Token: [1087, 1089] La (pos:[Value [pos:
>> >> > ADJ(olia:Adjective)].prob=0.016871281997002517]) chunk: 'none'
>> >> >
>> >> > ProcessingState - TokenData: 'La'[linkable=true(linkabkePos=null)|
>> >> > matchable=true(matchablePos=null)| alpha=true| seachLength=true|
>> >> > upperCase=true]
>> >> >
>> >>
>> >> The reason why the 'La' of the last sentence of your document is
>> >> marked as 'linkable' is the combination of the following things:
>> >>
>> >> 1. the POS tag has a very low probability (0.017) and is therefore
>> >> ignored as the configured minimum probability is higher as that.
>> >>
>> >
>> > Actually, i set both parameters "prop" and "pprob" to 0.01 , i didn't
>> > commit any mistake, did i ? You mentionned or a previous mail something
>> > about a strange tokenizing behaviour, it might be a source of a new
>> > problem: here is, for example a log excerpt from the stanbol web console
>> > for an integration test. I isolated the pathologic case :
>> >
>>
>> The reason is that "prop=0.01" should be "prob=0.01". There is a typo
>> in the default configuration, because of that the changed value for
>> "prop" does not have any effect. I created STANBOL-1100 for fixing
>> this.
>>
>> > and when i curl the text to Talismane, i get the following message:
>> >
>> > 16:49:21,166 [main] INFO server.Main - ... starting server
>> > 16:53:55,560 [btpool0-2] ERROR resource.AnalysisResource - Exception
>> while
>> > analysing Blob
>> > java.lang. IllegalArgumentException: Illegal span [2199,2201] for Token
>> > relative to Text: [0, 2200] : Span of the contained Token MUST NOT
>> extend
>> > the others!
>>
>> When implementing the Talismane Stanbol integration I had a lot of
>> problems with the getting the index positions right. Getting a Span of
>> a Token exceeding the size of the document could indicate that there
>> are still some problems with that.
>>
>> If you come across a text that can reproduce this please open an issue
>> on the stanbol-talismane [1]
>>
>> [1] https://github.com/westei/stanbol-talismane
>>
>>
OK, i will try it later. But from a glance at the doc, shouldn't i have a
pull access ? I guess not anyone has it


> best
>> Rupert
>>
>>
>> --
>> | Rupert Westenthaler             [email protected]
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Re: Problem with entityLinking on Uppercase tokens

Reply via email to