RE: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2

Jeyendran Balakrishnan Tue, 22 May 2012 01:07:01 -0700

FWIW, I'm using the POS tagger in 1.5.2, and getting probabilities of between 
0.5 to 1 for most tags [tested on a few hundred sentences].
I use tagger.tag(String[] tokens) to get the POS tags, and tagger.probs() 
immediately afterwards to get the probabilities.


Cheers,
Jeyendran


-----Original Message-----
From: Jörn Kottmann [mailto:[email protected]] 
Sent: Tuesday, May 22, 2012 12:48 AM
To: [email protected]
Subject: Re: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2

Hello,

that looks strange to me, I don't really know which changes caused this.
Did you run both tests on the same machine?

I will try to reproduce your results.

Thanks for reporting this!

Jörn

On 05/21/2012 01:49 PM, Rupert Westenthaler wrote:
> Hi,
>
> While debugging why POS tags are recently ignored by the Apache 
> Stanbol Enhancer I noticed that the reason where that with openNLP
> 1.5.2 the probabilities returned by the POS tagger have changed.
>
> Previously typical probabilities of POS tags where>  0.9+ for most of 
> the tokens. Because of that a configuration that ignores POS tags<
> 0.8 looked like a reasonable default. However with OpenNLP 1.5.2 
> probabilities are much lowers. At first it looks even like 1.5.2 
> returns now the uncertainty ('1-{probability}') instead of the 
> probability, but after looking a little bit into the source this seams 
> also unlikely to me.
>
> I have already searched the Documentation and recent Jira Issues, but 
> I could not find anything related.
>
> As an example the results for an single Sentence analyzed using 
> OpenNLP 1.5.1 and 1.5.2.
>
> Sentence:
>
>      A nice travel to the biggest volcano of Mexico.
>
> Tokens are as expected
>
> With openNLP 1.5.1 I get the following top Sequence when calling
> POSTaggerME#topKSequences(tokens):
>
> -0.0011259470521596032 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .]
>
> Detailed Probabilities:
>
> [1.0, 1.0, 0.9999999952604672, 0.9999999999971082, 1.0, 
> 0.9988748880601196, 0.9999999702598833, 1.0, 0.9999999999989716, 
> 0.9999998327848956]
>
> Switching to openNLP 1.5.2 results in
>
> -30.89400016135042 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .]
>
> Detailed Probabilities:
>
> [0.05013598125548828, 0.053016102976047086, 0.04032588713661259, 
> 0.03995389549856565, 0.04685198986899964, 0.03659501930208113, 
> 0.04132356969119329, 0.06434037591280849, 0.046311143933396866, 
> 0.04233395769746884]
>
>
> Is this a Bug or an intentional change. If the later it would be great 
> if someone could provide a link to the documentation.
>
> best
> Rupert Westenthaler
>
>
> p.s:
>
> with OpenNLP 1.5.1 I refer to
>
>      opennlp-tools-1.5.1-incubating.jar
>      opennlp-maxent-3.0.1-incubating.jar
>
> with OpenNLP 1.5.2 I refer to
>
>      opennlp-tools-1.5.2-incubating.jar
>      opennlp-maxent-3.0.2-incubating.jar
>
> In both cases the "en-pos-maxent.bin" as available via openly.sf.org 
> is used
>

RE: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2

Reply via email to