Hi Rodrigo, On 15.05.2017, at 15:36, Rodrigo Agerri <rage...@apache.org> wrote: > > I cannot reproduce the lemmatizer issue. Could you please share your > training data?
I have observed the change in behavior via the OpenNlpLemmatizerTrainerTest in DKPro Core [1]. It happens when I change the OpenNLP version in the POM from 1.7.2 to 1.8.0 (after including the OpenNLP staging Maven repo of course). Unfortunately, it's not a simple minimal OpenNLP-only unit test, but it makes used of the respective DKPro Core UIMA components. The data that is used is the GUM 3.0.0 corpus, specifically the CoNLL files in it [2]. The corpus can be downloaded from: https://github.com/amir-zeldes/gum/archive/V3.0.0.zip Cheers, -- Richard [1] https://github.com/dkpro/dkpro-core/blob/89f144a63b214cd584b3cd0e6c499dff6cbcd9ca/dkpro-core-opennlp-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/opennlp/OpenNlpLemmatizerTrainerTest.java [2] https://github.com/dkpro/dkpro-core/blob/master/dkpro-core-api-datasets-asl/src/main/resources/de/tudarmstadt/ukp/dkpro/core/api/datasets/lib/gum-en-conll-3.0.0.yaml