On 04/29/2013 01:43 AM, Andy McMurry wrote:
I encourage committers to checkout Apache Mahout
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms

Why Apache Mahout?
1. provides ML classifiers and functions not available through UIMA
2. parallel by design, transparently invokes Hadoop
3. Java and Apache license (every other known toolkit is GPL!)
4. likely to become standard ML package for Apache

Why would we use mahout in cTakes?
cTakes models are "provided", for example PoS tagging.
Retraining these models on your own compute cluster would be difficult  (in my 
opinion).
LibSVM is nice, but it is only one classification method.


The Mahout classifiers will probably soon be integrated into OpenNLP, here is the jira issue.
https://issues.apache.org/jira/browse/OPENNLP-574

The idea is to make the ML part in OpenNLP plugable, so that all kind of classification libraries can be supported.

Also interesting might be Mahouts Clustering and LDA capability, which can probably be performed on the
entire document database.

Jörn

Reply via email to