I am still becoming familiar with the way the project is internally structured, 
but I typically like to separate frameworks from implementations, so perhaps a 
framework package that holds factories and interfaces and the like, and another 
for implementations?

opennlp.tools.ml.framework
opennlp.tools.ml.impls

Let me know if I can help


Mark Giaconia 


-----Original Message-----
From: Samik Raychaudhuri [mailto:[email protected]] 
Sent: Friday, May 31, 2013 5:39 PM
To: [email protected]
Subject: [External] Re: Pluggable Machine Learning support

Yep, supporting the move to a new package/namespace.

On 5/31/2013 12:40 AM, Tommaso Teofili wrote:
> big +1!
>
> Tommaso
>
>
> 2013/5/31 William Colen <[email protected]>
>
>> I don't see any issue. People that uses Maxent directly would need to 
>> change how they use it, but that is OK for a major release.
>>
>>
>>
>>
>> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <[email protected]> wrote:
>>
>>> Are there any objections to move the maxent/perceptron classes to an 
>>> opennlp.tools.ml package as part of this issue? Moving the things 
>>> would avoid a second interface layer and probably make using OpenNLP 
>>> Tools a bit easier, because then we are down to a single jar.
>>>
>>> Jörn
>>>
>>>
>>> On 05/30/2013 08:57 PM, William Colen wrote:
>>>
>>>> +1 to add pluggable machine learning algorithms
>>>> +1 to improve the API and remove deprecated methods in 1.6.0
>>>>
>>>> You can assign related Jira issues to me and I will be glad to help.
>>>>
>>>>
>>>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann 
>>>> <[email protected]>
>>>> wrote:
>>>>
>>>>   Hi all,
>>>>> we spoke about it here and there already, to ensure that OpenNLP 
>>>>> can
>> stay
>>>>> competitive with other NLP libraries I am proposing to make the 
>>>>> machine learning pluggable.
>>>>>
>>>>> The extensions should not make it harder to use OpenNLP, if a user
>> loads
>>>>> a
>>>>> model OpenNLP should be capable of setting up everything by itself 
>>>>> without forcing the user to write custom integration code based on 
>>>>> the ml implementation.
>>>>> We solved this problem already with the extension mechanism, we 
>>>>> build
>> to
>>>>> support the customization of our components, I suggest that we 
>>>>> reuse
>> this
>>>>> extension mechanism to load a ml implementation. To use a custom 
>>>>> ml implementation the user has to specify the class name of the 
>>>>> factory in the Algorithm field of the params file. The params file 
>>>>> is available during training and tagging time.
>>>>>
>>>>> Most components in the tools package use the maxent library to do 
>>>>> classification. The Java interfaces for this are currently located 
>>>>> in
>> the
>>>>> maxent package, to be able to swap the implementation the 
>>>>> interfaces should be defined inside the tools package. To make 
>>>>> things easier I propose to move the maxent and perceptron 
>>>>> implemention as well.
>>>>>
>>>>> Through the code base we use the AbstractModel, thats a bit 
>>>>> unlucky because the only reason for this is the lack of model 
>>>>> serialization support in the MaxentModel interface, a 
>>>>> serialization method should be added to it, and maybe renamed to 
>>>>> ClassificationModel. This will break backward compatibility in 
>>>>> non-standard use cases.
>>>>>
>>>>> To be able to test the extension mechanism I suggest that we 
>>>>> implement
>> an
>>>>> addon which integrates liblinear and the Apache Mahout classifiers.
>>>>>
>>>>> There are still a few deprecated 1.4 constructors and methods in
>> OpenNLP
>>>>> which directly reference interfaces and classes in the maxent 
>>>>> library, these need to be removed, to be able to move the 
>>>>> interfaces to the
>> tools
>>>>> package.
>>>>>
>>>>> Any opinions?
>>>>>
>>>>> Jörn
>>>>>
>>>>>

Reply via email to