On 04/29/2013 01:43 AM, Andy McMurry wrote:
I encourage committers to checkout Apache Mahout
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
Why Apache Mahout?
1. provides ML classifiers and functions not available through UIMA
2. parallel by design, transparently invokes Hadoop
3. Java and Apache license (every other known toolkit is GPL!)
4. likely to become standard ML package for Apache
Why would we use mahout in cTakes?
cTakes models are "provided", for example PoS tagging.
Retraining these models on your own compute cluster would be difficult (in my
opinion).
LibSVM is nice, but it is only one classification method.
The Mahout classifiers will probably soon be integrated into OpenNLP,
here is the jira issue.
https://issues.apache.org/jira/browse/OPENNLP-574
The idea is to make the ML part in OpenNLP plugable, so that all kind of
classification libraries can be supported.
Also interesting might be Mahouts Clustering and LDA capability, which
can probably be performed on the
entire document database.
Jörn