Hi all,

we spoke about it here and there already, to ensure that OpenNLP can stay competitive with other NLP libraries I am proposing to make the machine learning pluggable.

The extensions should not make it harder to use OpenNLP, if a user loads a model OpenNLP should be capable of setting up everything by itself without forcing the user to write custom integration code based on the ml implementation. We solved this problem already with the extension mechanism, we build to support the customization of our components, I suggest that we reuse this extension mechanism to load a ml implementation. To use a custom ml implementation the user has to specify the class name of the factory in the Algorithm field of the params file. The params file is available during training and tagging time.

Most components in the tools package use the maxent library to do classification. The Java interfaces for this are currently located in the maxent package, to be able to swap the implementation the interfaces should be defined inside the tools package. To make things easier I propose to move the maxent and perceptron implemention as well.

Through the code base we use the AbstractModel, thats a bit unlucky because the only reason for this is the lack of model serialization support in the MaxentModel interface, a serialization method should be added to it, and maybe renamed to ClassificationModel. This will
break backward compatibility in non-standard use cases.

To be able to test the extension mechanism I suggest that we implement an addon which integrates liblinear and the Apache Mahout classifiers.

There are still a few deprecated 1.4 constructors and methods in OpenNLP which directly reference interfaces and classes in the maxent library, these need to be removed, to be able to move the interfaces to the tools package.

Any opinions?

Jörn

Reply via email to