Re: The pons asinorum of applying MAXENT to a new structured problem

Benson Margulies Wed, 15 May 2013 04:55:43 -0700

We believe that it's academic-only unless one does a deal with IIT.
However, the author has moved on from IIT, and perhaps you, as an
academic, would have more success in convincing someone there to just
open source it.



On Wed, May 15, 2013 at 7:50 AM, Jörn Kottmann <[email protected]> wrote:
> Thanks for letting me know about JLIS, looks interesting; will probably try
> it
> out when we have the pluggable classifier support.
>
> Do you know which license it has? I couldn't see it on their page.
>
> Jörn
>
>
> On 05/15/2013 01:19 PM, Benson Margulies wrote:
>>
>> At work _I_ have a perceptron framework that makes extensive use of
>> hashed features (32 bits was enough), and then we are also looking at
>> the SSVM from the JLIS framework. I particularly recommend the jlis
>> formalism; it's not explicitly hashed, but it makes it the business of
>> the individual problem implementation to make these decisions (though
>> it would require representing hash values as strings across one API).
>> Sadly, JLIS is not code that can be absorbed in an Apache project.
>>
>> http://flake.cs.uiuc.edu/~mchang21/softwares/JLIS/ssvm.html
>>
>> What I am working in right now is not exactly a tagger, it is a
>> disambiguator. Instead of assigning a tag to each token, it picks an
>> decomposition for the token from a list. I've applied it to
>> disambiguating Arabic Buckwalter output and KLEX Korean output.
>>
>> I have it working with my perceptron framework, but I was interested
>> in a comparison. But it has a gigantic sparse feature space, so I need
>> hashing. My name tagging code could be adapted from perceptron to your
>> framework but I'n not really motivated to try that out just now.
>>
>>
>>
>>
>> On Wed, May 15, 2013 at 6:34 AM, Jörn Kottmann <[email protected]> wrote:
>>>
>>> On 05/14/2013 11:07 PM, Benson Margulies wrote:
>>>>
>>>> Folks,
>>>>
>>>> I expected to see something like a feature generator; something that
>>>> looked at a structure and returned a set of feature activations.
>>>>
>>>> I don't claim to have much expertise with MEMM, but I sure know one
>>>> end of a perceptron from another.
>>>>
>>>> Looking, for example, at POSContextGenerator, what is the String[]
>>>> return value? Is it perhaps just a list of named active features? But
>>>> wouldn't you need a count for each one?
>>>
>>>
>>> Yes, its a list of all named active features, if a feature is detected n
>>> times it occurs n times in the list.
>>> We started to work on a feature generation framework
>>> (opennlp.util.featuregen) to make the name finder adaptable,
>>> the original plan was to reuse this work for the POS Tagger and Chunker
>>> as
>>> well, but it has not been done yet.
>>>
>>> Are you interested to experiment with your own feature generation? Its
>>> possible to implement a custom POSTaggerFactory which
>>> can completely customize the feature generation.
>>>
>>> At work I use a fork of OpenNLP where the feature generation for the name
>>> finder produces 64 bit hash features instead of Strings,
>>> this works quite a bit faster, and I will probably write up a proposal at
>>> some point and contribute the code, but currently I am limited time wise.
>>>
>>> In OpenNLP we also have a perceptron, you can configure this via a params
>>> file you can pass in during training. Exchanging the classifier against
>>> your
>>> own implementation is not yet possible, but will be in the next release.
>>>
>>> HTH,
>>> Jörn
>
>

Re: The pons asinorum of applying MAXENT to a new structured problem

Reply via email to