Re: [opennlp-dev] TokenNameFinderFactory new features and extension

Rodrigo Agerri Mon, 06 Oct 2014 09:01:16 -0700

Hi,

On Mon, Oct 6, 2014 at 5:41 PM, Jörn Kottmann <[email protected]> wrote:
>
> Isn't that how it is implemented today? The feature generators can't be
> shared
> and therefore we have the createFeatureGenerators method in the
> TokenNameFinderFactory
> which creates a new feature generator every time one is needed.
> That one tries to read the xml descriptor from the model and creates the
> feature generators.


Yes, but with one exception: it all goes well until it arrives to line
361 of NameFinderME:

return new TokenNameFinderModel(languageCode, nameFinderModel,
beamSize, null, factory.getResources(), manifestInfoEntries,
factory.getSequenceCodec());

that "null" parameter is the featureGenerator. The init() method in
the TokenNameFinderModel class get that null and returns the default
feature generator.

what is needed is to pass the featureGenerator created by the
TokenNameFinder.createContext() as a parameter. That is why I added a
getter in the TokenNameFinderFactory for the field private byte[]
featureGeneratorBytes. I just add it and in to create the
TokenNameFinderModel above in NameFinderME I say:

 return new TokenNameFinderModel(languageCode, nameFinderModel,
beamSize, factory.getFeatureGenerator(), factory.getResources(),
manifestInfoEntries, factory.getSequenceCodec());

and it all works as expected.

> I will try to reproduce the bug you see.
>
> How can I do that?
>
> First train a model with this command:

> bin/opennlp TokenNameFinderTrainer -featuregen bigram.xml -factory
> opennlp.tools.namefind.TokenNameFinderFactory -sequenceCodec BIO
> -params lang/ml/PerceptronTrainerParams.txt -lang nl -model test.bin
> -data ~/experiments/nerc/opennlp/data/nl/conll2002/nl_opennlp.testa.train
>
> and this feature generator config:
> <generators>
>   <cache>
>     <generators>
>       <window prevLength = "2" nextLength = "2">
>         <tokenclass/>
>       </window>
>       <window prevLength = "2" nextLength = "2">
>         <token/>
>       </window>
>       <definition/>
>       <prevmap/>
>       <bigram/>
>       <sentence begin="true" end="false"/>
>       <prefix/>
>       <suffix/>
>     </generators>
>   </cache>
> </generators>
>
> Did you use the command line tool for the evaluation too?
> Maybe you can post the command for that.

Yes, and then try to train with the default featureGenerator in the
lang/en/namefind directory.

bin/opennlp TokenNameFinderEvaluator -model test.bin -data
~/experiments/nerc/opennlp/data/nl/conll2002/opennlp-nl.testb

Cheers,

Rodrigo

Re: [opennlp-dev] TokenNameFinderFactory new features and extension

Reply via email to