On 05/14/2013 11:07 PM, Benson Margulies wrote:
Folks,
I expected to see something like a feature generator; something that
looked at a structure and returned a set of feature activations.
I don't claim to have much expertise with MEMM, but I sure know one
end of a perceptron from another.
Looking, for example, at POSContextGenerator, what is the String[]
return value? Is it perhaps just a list of named active features? But
wouldn't you need a count for each one?
Yes, its a list of all named active features, if a feature is detected n
times it occurs n times in the list.
We started to work on a feature generation framework
(opennlp.util.featuregen) to make the name finder adaptable,
the original plan was to reuse this work for the POS Tagger and Chunker
as well, but it has not been done yet.
Are you interested to experiment with your own feature generation? Its
possible to implement a custom POSTaggerFactory which
can completely customize the feature generation.
At work I use a fork of OpenNLP where the feature generation for the
name finder produces 64 bit hash features instead of Strings,
this works quite a bit faster, and I will probably write up a proposal
at some point and contribute the code, but currently I am limited time wise.
In OpenNLP we also have a perceptron, you can configure this via a
params file you can pass in during training. Exchanging the classifier
against your
own implementation is not yet possible, but will be in the next release.
HTH,
Jörn