I don't see any issue. People that uses Maxent directly would need to change how they use it, but that is OK for a major release.
On Thu, May 30, 2013 at 5:56 PM, Jörn Kottmann <[email protected]> wrote: > Are there any objections to move the maxent/perceptron classes to an > opennlp.tools.ml > package as part of this issue? Moving the things would avoid a second > interface layer and > probably make using OpenNLP Tools a bit easier, because then we are down > to a single jar. > > Jörn > > > On 05/30/2013 08:57 PM, William Colen wrote: > >> +1 to add pluggable machine learning algorithms >> +1 to improve the API and remove deprecated methods in 1.6.0 >> >> You can assign related Jira issues to me and I will be glad to help. >> >> >> On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <[email protected]> >> wrote: >> >> Hi all, >>> >>> we spoke about it here and there already, to ensure that OpenNLP can stay >>> competitive with other NLP libraries I am proposing to make the machine >>> learning pluggable. >>> >>> The extensions should not make it harder to use OpenNLP, if a user loads >>> a >>> model OpenNLP should be capable of setting up everything by itself >>> without >>> forcing the user to write custom integration code based on the ml >>> implementation. >>> We solved this problem already with the extension mechanism, we build to >>> support the customization of our components, I suggest that we reuse this >>> extension mechanism to load a ml implementation. To use a custom ml >>> implementation the user has to specify the class name of the factory in >>> the >>> Algorithm field of the params file. The params file is available during >>> training and tagging time. >>> >>> Most components in the tools package use the maxent library to do >>> classification. The Java interfaces for this are currently located in the >>> maxent package, to be able to swap the implementation the interfaces >>> should >>> be defined inside the tools package. To make things easier I propose to >>> move the maxent and perceptron implemention as well. >>> >>> Through the code base we use the AbstractModel, thats a bit unlucky >>> because the only reason for this is the lack of model serialization >>> support >>> in the MaxentModel interface, a serialization method should be added to >>> it, >>> and maybe renamed to ClassificationModel. This will >>> break backward compatibility in non-standard use cases. >>> >>> To be able to test the extension mechanism I suggest that we implement an >>> addon which integrates liblinear and the Apache Mahout classifiers. >>> >>> There are still a few deprecated 1.4 constructors and methods in OpenNLP >>> which directly reference interfaces and classes in the maxent library, >>> these need to be removed, to be able to move the interfaces to the tools >>> package. >>> >>> Any opinions? >>> >>> Jörn >>> >>> >
