Hi Richard, I cloned DKPro code and tried Rodrigo proposed changes. Your test passes with it.
Thank you William 2017-05-15 18:51 GMT-03:00 Rodrigo Agerri <rage...@apache.org>: > Hello Richard, > > I have tried with various corpora, including GUM, but I cannot reproduce > that error. > > https://github.com/apache/opennlp/commit/8a3b3b537a30b14c4ffb5eb32ffa41 > d5027bddad > > Please note that commit O-904 changed (broke) the lemmatizer API > substantially to make it uniform between DictionaryLemmatizer and the > LemmatizerME (e.g., doing the decoding of lemmas internally and so on) so > that this line for tagging with the LemmatizerME is not required: > > https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499d > ff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/ > tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpLemmatizer.java#L135 > > Also, that commit changed the LemmaSampleStream and LemmaSample classes, so > it is possible that is affecting this class: > > https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499d > ff6cbcd9ca/dkpro-core-opennlp-asl/src/main/java/de/ > tudarmstadt/ukp/dkpro/core/opennlp/internal/CasLemmaSampleStream.java > > I understand the logic of this class correctly as it stands it will take an > already encoded SES and will try to encoded it again? > > Could you please take a look and see if that could be the problem? > > Cheers, > > Rodrigo > > On Mon, May 15, 2017 at 6:21 PM, Richard Eckart de Castilho < > r...@apache.org> > wrote: > > > > On 15.05.2017, at 16:35, Joern Kottmann <kottm...@gmail.com> wrote: > > > > > > Richard, I believe I found the problem with the parser, would you mind > to > > > take a look? > > > > > > This PR should fix it: > > > https://github.com/apache/opennlp/pull/199 > > > > The parser test works nicely with the PR. > > > > The lemmatizer test still behaves strange. > > > > Cheers, > > > > -- Richard > > > > >