+1 to have support for pluggable machine learning.

We should not drop the current maxent/perceptron implementation,
I think its great for users to get started and it works nicely for many
use cases. But some people want to play around with more sophisticated
ml and that can nicely be done by existing libraries and some
OpenNLP integration code.

We will hopefully soon have support for L-BFGS training in
our maxent implementation too (see OPENNLP-338).

Jörn

On 08/30/2012 04:36 PM, Jason Baldridge wrote:
I've recently been using the Java port of Liblinear for training logistic
regression (maxent) models, and am really happy with it -- it uses TRON, a
method that is much faster than LBFGS. Jorn and I have discussed having
OpenNLP shed the maxent code and just use wrappers around better, more
up-to-date libraries like Liblinear. I don't have time to do this for
OpenNLP right now, but if anyone wants to take a crack at it, here's some
Scala code that I did for Breeze that has most of what you'd need to set
this up:

https://github.com/dlwh/breeze/blob/master/learn/src/main/scala/breeze/classify/Liblinear.scala

It includes an example for reading in data that is the same as the prep
attach data used for testing in OpenNLP, so that should make things
reasonably clear for someone with a decent understand of Scala syntax. (I'm
happy to help with any questions about it.)

More links for context:

    - The main C++ Liblinear page, with lots of documentation including
    papers: http://www.csie.ntu.edu.tw/~cjlin/liblinear/
    - The Java port (I'm currently using 1.8, but 1.91 was just wrapped up):
    http://www.bwaldvogel.de/liblinear-java/

Liblinear has support for several regimes for both logistic regression and
SVMs. OpenNLP would do well to hook into all that!

Jason


Reply via email to