FWIW, I'm using the POS tagger in 1.5.2, and getting probabilities of between 0.5 to 1 for most tags [tested on a few hundred sentences]. I use tagger.tag(String[] tokens) to get the POS tags, and tagger.probs() immediately afterwards to get the probabilities.
Cheers, Jeyendran -----Original Message----- From: Jörn Kottmann [mailto:[email protected]] Sent: Tuesday, May 22, 2012 12:48 AM To: [email protected] Subject: Re: POS Tagger Probability changes between OpenNLP 1.5.1 and 1.5.2 Hello, that looks strange to me, I don't really know which changes caused this. Did you run both tests on the same machine? I will try to reproduce your results. Thanks for reporting this! Jörn On 05/21/2012 01:49 PM, Rupert Westenthaler wrote: > Hi, > > While debugging why POS tags are recently ignored by the Apache > Stanbol Enhancer I noticed that the reason where that with openNLP > 1.5.2 the probabilities returned by the POS tagger have changed. > > Previously typical probabilities of POS tags where> 0.9+ for most of > the tokens. Because of that a configuration that ignores POS tags< > 0.8 looked like a reasonable default. However with OpenNLP 1.5.2 > probabilities are much lowers. At first it looks even like 1.5.2 > returns now the uncertainty ('1-{probability}') instead of the > probability, but after looking a little bit into the source this seams > also unlikely to me. > > I have already searched the Documentation and recent Jira Issues, but > I could not find anything related. > > As an example the results for an single Sentence analyzed using > OpenNLP 1.5.1 and 1.5.2. > > Sentence: > > A nice travel to the biggest volcano of Mexico. > > Tokens are as expected > > With openNLP 1.5.1 I get the following top Sequence when calling > POSTaggerME#topKSequences(tokens): > > -0.0011259470521596032 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .] > > Detailed Probabilities: > > [1.0, 1.0, 0.9999999952604672, 0.9999999999971082, 1.0, > 0.9988748880601196, 0.9999999702598833, 1.0, 0.9999999999989716, > 0.9999998327848956] > > Switching to openNLP 1.5.2 results in > > -30.89400016135042 [DT, JJ, NN, TO, DT, JJS, NN, IN, NNP, .] > > Detailed Probabilities: > > [0.05013598125548828, 0.053016102976047086, 0.04032588713661259, > 0.03995389549856565, 0.04685198986899964, 0.03659501930208113, > 0.04132356969119329, 0.06434037591280849, 0.046311143933396866, > 0.04233395769746884] > > > Is this a Bug or an intentional change. If the later it would be great > if someone could provide a link to the documentation. > > best > Rupert Westenthaler > > > p.s: > > with OpenNLP 1.5.1 I refer to > > opennlp-tools-1.5.1-incubating.jar > opennlp-maxent-3.0.1-incubating.jar > > with OpenNLP 1.5.2 I refer to > > opennlp-tools-1.5.2-incubating.jar > opennlp-maxent-3.0.2-incubating.jar > > In both cases the "en-pos-maxent.bin" as available via openly.sf.org > is used >
