Are there any objections to move the maxent/perceptron classes to an
opennlp.tools.ml
package as part of this issue? Moving the things would avoid a second
interface layer and
probably make using OpenNLP Tools a bit easier, because then we are down
to a single jar.
Jörn
On 05/30/2013 08:57 PM, William Colen wrote:
+1 to add pluggable machine learning algorithms
+1 to improve the API and remove deprecated methods in 1.6.0
You can assign related Jira issues to me and I will be glad to help.
On Thu, May 30, 2013 at 11:53 AM, Jörn Kottmann <[email protected]> wrote:
Hi all,
we spoke about it here and there already, to ensure that OpenNLP can stay
competitive with other NLP libraries I am proposing to make the machine
learning pluggable.
The extensions should not make it harder to use OpenNLP, if a user loads a
model OpenNLP should be capable of setting up everything by itself without
forcing the user to write custom integration code based on the ml
implementation.
We solved this problem already with the extension mechanism, we build to
support the customization of our components, I suggest that we reuse this
extension mechanism to load a ml implementation. To use a custom ml
implementation the user has to specify the class name of the factory in the
Algorithm field of the params file. The params file is available during
training and tagging time.
Most components in the tools package use the maxent library to do
classification. The Java interfaces for this are currently located in the
maxent package, to be able to swap the implementation the interfaces should
be defined inside the tools package. To make things easier I propose to
move the maxent and perceptron implemention as well.
Through the code base we use the AbstractModel, thats a bit unlucky
because the only reason for this is the lack of model serialization support
in the MaxentModel interface, a serialization method should be added to it,
and maybe renamed to ClassificationModel. This will
break backward compatibility in non-standard use cases.
To be able to test the extension mechanism I suggest that we implement an
addon which integrates liblinear and the Apache Mahout classifiers.
There are still a few deprecated 1.4 constructors and methods in OpenNLP
which directly reference interfaces and classes in the maxent library,
these need to be removed, to be able to move the interfaces to the tools
package.
Any opinions?
Jörn