Re: [opennlp-dev] TokenNameFinderFactory new features and extension

Jörn Kottmann Fri, 03 Oct 2014 03:41:11 -0700

On 10/03/2014 11:58 AM, Rodrigo Agerri wrote:

I have implemented a number of new features for the name finder. These
include Brown clusters features (duplicated per Brown path for each
feature activated involving a token) and Clark cluster features
(similar to the WordClusterFeatureGenerator currently available) among
other local extra features which interact well with the clustering
ones.


I think it will be nice to include them before the new release. I will
open issues about each of them. What do you think?

Yes please open issues for them. It would be really nice to receive themas a contribution.


There are two things you need to do:
1. Implement the feature generators

- Implement AdaptiveFeatureGenerator or extend CustomFeatureGenerator ifyou need to pass parameters to it


2.//Implement support for load and serialize the data they need
- This class should implement SerializableArtifact

- And if you want to load use it the Feature Generator should implementArtifactToSerializerMapper, that one tells

the loader which class to use to read the data file

The above is the procedure you should use if you want to have a realcustom feature generator which is not part of

the OpenNLP Tools jar.

When you contribute it, things are slightly different. You should add aXmlFeatureGeneratorFactory inside the GeneratorFactoryclass. This factory creates the feature generator based on a defined xmlelement inside the descriptor.

6.*Some*  of the new features work. If an Element name in the
descriptor does not match in the GeneratorFactory, then the
TokenNameFinderFactory.createFeatureGenerators() gives a null and the
TokenNameFinderFactory.createContextGenerator() automatically stops
the feature creation and goes for the
NameFinderME.createFeatureGenerator().
Is this the desired behaviour? Perhaps we could add a log somewhere?
To inform of the backoff to the default features if one descriptor
element does not match?

That sounds really bad. If there is a problem in the mapping it shouldfail hard and throw anexception. The user should be forced to decide by himself what do to,either fix his descriptor

or use defaults.

The steps 4 and 5 you describe should not be necessary to add newfeature generators.

The idea is that we always use the xml descriptor to define the featuregeneration, that way we can have differentconfigurations without changing the OpenNLP code itself, and don't needspecial user code to integrate acustomized name finder model. If a model makes use of external classesthese of course need to be on the classpath

since we can't ship them as a part of the model.

HTH,
Jörn

Re: [opennlp-dev] TokenNameFinderFactory new features and extension

Reply via email to