Re: [opennlp-dev] TokenNameFinderFactory new features and extension

Jörn Kottmann Mon, 06 Oct 2014 08:42:55 -0700

On 10/06/2014 04:49 PM, Rodrigo Agerri wrote:

As I said, I have issue 717 solved by adding a getter for the
featureGenerator in the TokenNameFactory and using that getter to
parametrized correctly the creation of the TokenNameFinderModel after
training.

Isn't that how it is implemented today? The feature generators can't besharedand therefore we have the createFeatureGenerators method in theTokenNameFinderFactory

which creates a new feature generator every time one is needed.

That one tries to read the xml descriptor from the model and creates thefeature generators.

You say it uses the default feature generation, that can only happen ifthe createFeatureGenerator

method returns null. Is that true in your case?

In which place, exactly, did you add the getter method to fix theproblem, and where in TokenNameFinderModeldid you call it? The TokenNameFinderFactory doesn't have an instancevariable called featureGenerator.

I am just trying to understand how your proposed fix works.

Usually the model is created by using one of the constructors which takean InputStream,

File or URL. Did you use a different constructor to create the model?

I will try to reproduce the bug you see.

How can I do that?

First train a model with this command:
bin/opennlp TokenNameFinderTrainer -featuregen bigram.xml -factory
opennlp.tools.namefind.TokenNameFinderFactory -sequenceCodec BIO
-params lang/ml/PerceptronTrainerParams.txt -lang nl -model test.bin
-data ~/experiments/nerc/opennlp/data/nl/conll2002/nl_opennlp.testa.train

and this feature generator config:
<generators>
  <cache>
    <generators>
      <window prevLength = "2" nextLength = "2">
        <tokenclass/>
      </window>
      <window prevLength = "2" nextLength = "2">
        <token/>
      </window>
      <definition/>
      <prevmap/>
      <bigram/>
      <sentence begin="true" end="false"/>
      <prefix/>
      <suffix/>
    </generators>
  </cache>
</generators>

Did you use the command line tool for the evaluation too?
Maybe you can post the command for that.

Jörn

Re: [opennlp-dev] TokenNameFinderFactory new features and extension

Reply via email to