+1 to have support for pluggable machine learning. We should not drop the current maxent/perceptron implementation, I think its great for users to get started and it works nicely for many use cases. But some people want to play around with more sophisticated ml and that can nicely be done by existing libraries and some OpenNLP integration code.
We will hopefully soon have support for L-BFGS training in our maxent implementation too (see OPENNLP-338). Jörn On 08/30/2012 04:36 PM, Jason Baldridge wrote:
I've recently been using the Java port of Liblinear for training logistic regression (maxent) models, and am really happy with it -- it uses TRON, a method that is much faster than LBFGS. Jorn and I have discussed having OpenNLP shed the maxent code and just use wrappers around better, more up-to-date libraries like Liblinear. I don't have time to do this for OpenNLP right now, but if anyone wants to take a crack at it, here's some Scala code that I did for Breeze that has most of what you'd need to set this up: https://github.com/dlwh/breeze/blob/master/learn/src/main/scala/breeze/classify/Liblinear.scala It includes an example for reading in data that is the same as the prep attach data used for testing in OpenNLP, so that should make things reasonably clear for someone with a decent understand of Scala syntax. (I'm happy to help with any questions about it.) More links for context: - The main C++ Liblinear page, with lots of documentation including papers: http://www.csie.ntu.edu.tw/~cjlin/liblinear/ - The Java port (I'm currently using 1.8, but 1.91 was just wrapped up): http://www.bwaldvogel.de/liblinear-java/ Liblinear has support for several regimes for both logistic regression and SVMs. OpenNLP would do well to hook into all that! Jason