I am still becoming familiar with the way the project is internally structured, but I typically like to separate frameworks from implementations, so perhaps a framework package that holds factories and interfaces and the like, and another for implementations?
opennlp.tools.ml.framework opennlp.tools.ml.impls Let me know if I can help Mark Giaconia -----Original Message----- From: Samik Raychaudhuri [mailto:[email protected]] Sent: Friday, May 31, 2013 5:39 PM To: [email protected] Subject: [External] Re: Pluggable Machine Learning support Yep, supporting the move to a new package/namespace. On 5/31/2013 12:40 AM, Tommaso Teofili wrote: > big +1! > > Tommaso > > > 2013/5/31 William Colen <[email protected]> > >> I don't see any issue. People that uses Maxent directly would need to >> change how they use it, but that is OK for a major release. >> >> >> >> >> On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <[email protected]> wrote: >> >>> Are there any objections to move the maxent/perceptron classes to an >>> opennlp.tools.ml package as part of this issue? Moving the things >>> would avoid a second interface layer and probably make using OpenNLP >>> Tools a bit easier, because then we are down to a single jar. >>> >>> Jörn >>> >>> >>> On 05/30/2013 08:57 PM, William Colen wrote: >>> >>>> +1 to add pluggable machine learning algorithms >>>> +1 to improve the API and remove deprecated methods in 1.6.0 >>>> >>>> You can assign related Jira issues to me and I will be glad to help. >>>> >>>> >>>> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann >>>> <[email protected]> >>>> wrote: >>>> >>>> Hi all, >>>>> we spoke about it here and there already, to ensure that OpenNLP >>>>> can >> stay >>>>> competitive with other NLP libraries I am proposing to make the >>>>> machine learning pluggable. >>>>> >>>>> The extensions should not make it harder to use OpenNLP, if a user >> loads >>>>> a >>>>> model OpenNLP should be capable of setting up everything by itself >>>>> without forcing the user to write custom integration code based on >>>>> the ml implementation. >>>>> We solved this problem already with the extension mechanism, we >>>>> build >> to >>>>> support the customization of our components, I suggest that we >>>>> reuse >> this >>>>> extension mechanism to load a ml implementation. To use a custom >>>>> ml implementation the user has to specify the class name of the >>>>> factory in the Algorithm field of the params file. The params file >>>>> is available during training and tagging time. >>>>> >>>>> Most components in the tools package use the maxent library to do >>>>> classification. The Java interfaces for this are currently located >>>>> in >> the >>>>> maxent package, to be able to swap the implementation the >>>>> interfaces should be defined inside the tools package. To make >>>>> things easier I propose to move the maxent and perceptron >>>>> implemention as well. >>>>> >>>>> Through the code base we use the AbstractModel, thats a bit >>>>> unlucky because the only reason for this is the lack of model >>>>> serialization support in the MaxentModel interface, a >>>>> serialization method should be added to it, and maybe renamed to >>>>> ClassificationModel. This will break backward compatibility in >>>>> non-standard use cases. >>>>> >>>>> To be able to test the extension mechanism I suggest that we >>>>> implement >> an >>>>> addon which integrates liblinear and the Apache Mahout classifiers. >>>>> >>>>> There are still a few deprecated 1.4 constructors and methods in >> OpenNLP >>>>> which directly reference interfaces and classes in the maxent >>>>> library, these need to be removed, to be able to move the >>>>> interfaces to the >> tools >>>>> package. >>>>> >>>>> Any opinions? >>>>> >>>>> Jörn >>>>> >>>>>
