Thanks for letting me know about JLIS, looks interesting; will probably try it
out when we have the pluggable classifier support.

Do you know which license it has? I couldn't see it on their page.

Jörn

On 05/15/2013 01:19 PM, Benson Margulies wrote:
At work _I_ have a perceptron framework that makes extensive use of
hashed features (32 bits was enough), and then we are also looking at
the SSVM from the JLIS framework. I particularly recommend the jlis
formalism; it's not explicitly hashed, but it makes it the business of
the individual problem implementation to make these decisions (though
it would require representing hash values as strings across one API).
Sadly, JLIS is not code that can be absorbed in an Apache project.

http://flake.cs.uiuc.edu/~mchang21/softwares/JLIS/ssvm.html

What I am working in right now is not exactly a tagger, it is a
disambiguator. Instead of assigning a tag to each token, it picks an
decomposition for the token from a list. I've applied it to
disambiguating Arabic Buckwalter output and KLEX Korean output.

I have it working with my perceptron framework, but I was interested
in a comparison. But it has a gigantic sparse feature space, so I need
hashing. My name tagging code could be adapted from perceptron to your
framework but I'n not really motivated to try that out just now.




On Wed, May 15, 2013 at 6:34 AM, Jörn Kottmann <[email protected]> wrote:
On 05/14/2013 11:07 PM, Benson Margulies wrote:
Folks,

I expected to see something like a feature generator; something that
looked at a structure and returned a set of feature activations.

I don't claim to have much expertise with MEMM, but I sure know one
end of a perceptron from another.

Looking, for example, at POSContextGenerator, what is the String[]
return value? Is it perhaps just a list of named active features? But
wouldn't you need a count for each one?

Yes, its a list of all named active features, if a feature is detected n
times it occurs n times in the list.
We started to work on a feature generation framework
(opennlp.util.featuregen) to make the name finder adaptable,
the original plan was to reuse this work for the POS Tagger and Chunker as
well, but it has not been done yet.

Are you interested to experiment with your own feature generation? Its
possible to implement a custom POSTaggerFactory which
can completely customize the feature generation.

At work I use a fork of OpenNLP where the feature generation for the name
finder produces 64 bit hash features instead of Strings,
this works quite a bit faster, and I will probably write up a proposal at
some point and contribute the code, but currently I am limited time wise.

In OpenNLP we also have a perceptron, you can configure this via a params
file you can pass in during training. Exchanging the classifier against your
own implementation is not yet possible, but will be in the next release.

HTH,
Jörn

Reply via email to