Hi, On Mon, Oct 6, 2014 at 5:41 PM, Jörn Kottmann <[email protected]> wrote: > > Isn't that how it is implemented today? The feature generators can't be > shared > and therefore we have the createFeatureGenerators method in the > TokenNameFinderFactory > which creates a new feature generator every time one is needed. > That one tries to read the xml descriptor from the model and creates the > feature generators.
Yes, but with one exception: it all goes well until it arrives to line 361 of NameFinderME: return new TokenNameFinderModel(languageCode, nameFinderModel, beamSize, null, factory.getResources(), manifestInfoEntries, factory.getSequenceCodec()); that "null" parameter is the featureGenerator. The init() method in the TokenNameFinderModel class get that null and returns the default feature generator. what is needed is to pass the featureGenerator created by the TokenNameFinder.createContext() as a parameter. That is why I added a getter in the TokenNameFinderFactory for the field private byte[] featureGeneratorBytes. I just add it and in to create the TokenNameFinderModel above in NameFinderME I say: return new TokenNameFinderModel(languageCode, nameFinderModel, beamSize, factory.getFeatureGenerator(), factory.getResources(), manifestInfoEntries, factory.getSequenceCodec()); and it all works as expected. > I will try to reproduce the bug you see. > > How can I do that? > > First train a model with this command: > bin/opennlp TokenNameFinderTrainer -featuregen bigram.xml -factory > opennlp.tools.namefind.TokenNameFinderFactory -sequenceCodec BIO > -params lang/ml/PerceptronTrainerParams.txt -lang nl -model test.bin > -data ~/experiments/nerc/opennlp/data/nl/conll2002/nl_opennlp.testa.train > > and this feature generator config: > <generators> > <cache> > <generators> > <window prevLength = "2" nextLength = "2"> > <tokenclass/> > </window> > <window prevLength = "2" nextLength = "2"> > <token/> > </window> > <definition/> > <prevmap/> > <bigram/> > <sentence begin="true" end="false"/> > <prefix/> > <suffix/> > </generators> > </cache> > </generators> > > Did you use the command line tool for the evaluation too? > Maybe you can post the command for that. Yes, and then try to train with the default featureGenerator in the lang/en/namefind directory. bin/opennlp TokenNameFinderEvaluator -model test.bin -data ~/experiments/nerc/opennlp/data/nl/conll2002/opennlp-nl.testb Cheers, Rodrigo
