On 10/06/2014 04:49 PM, Rodrigo Agerri wrote:
As I said, I have issue 717 solved by adding a getter for the
featureGenerator in the TokenNameFactory and using that getter to
parametrized correctly the creation of the TokenNameFinderModel after
training.
Isn't that how it is implemented today? The feature generators can't be
shared
and therefore we have the createFeatureGenerators method in the
TokenNameFinderFactory
which creates a new feature generator every time one is needed.
That one tries to read the xml descriptor from the model and creates the
feature generators.
You say it uses the default feature generation, that can only happen if
the createFeatureGenerator
method returns null. Is that true in your case?
In which place, exactly, did you add the getter method to fix the
problem, and where in TokenNameFinderModel
did you call it? The TokenNameFinderFactory doesn't have an instance
variable called featureGenerator.
I am just trying to understand how your proposed fix works.
Usually the model is created by using one of the constructors which take
an InputStream,
File or URL. Did you use a different constructor to create the model?
I will try to reproduce the bug you see.
How can I do that?
First train a model with this command:
bin/opennlp TokenNameFinderTrainer -featuregen bigram.xml -factory
opennlp.tools.namefind.TokenNameFinderFactory -sequenceCodec BIO
-params lang/ml/PerceptronTrainerParams.txt -lang nl -model test.bin
-data ~/experiments/nerc/opennlp/data/nl/conll2002/nl_opennlp.testa.train
and this feature generator config:
<generators>
<cache>
<generators>
<window prevLength = "2" nextLength = "2">
<tokenclass/>
</window>
<window prevLength = "2" nextLength = "2">
<token/>
</window>
<definition/>
<prevmap/>
<bigram/>
<sentence begin="true" end="false"/>
<prefix/>
<suffix/>
</generators>
</cache>
</generators>
Did you use the command line tool for the evaluation too?
Maybe you can post the command for that.
Jörn