Re: roadmap for Apache cTakes "big data" processing

Andy McMurry Sun, 28 Apr 2013 16:44:21 -0700

I encourage committers to checkout Apache Mahout 
https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms

Why Apache Mahout? 
1. provides ML classifiers and functions not available through UIMA
2. parallel by design, transparently invokes Hadoop  
3. Java and Apache license (every other known toolkit is GPL!) 
4. likely to become standard ML package for Apache 

Why would we use mahout in cTakes? 
cTakes models are "provided", for example PoS tagging. 
Retraining these models on your own compute cluster would be difficult  (in my 
opinion). 
LibSVM is nice, but it is only one classification method. 

When ? 
No rush, however, I suggest we dont invest time in porting SINGLE-CPU 
classifier functions that we will have to parallelize, later. 

Summary: 
UIMA + mahout = pipelines + classification 

On Apr 28, 2013, at 4:26 PM, "Savova, Guergana" 
<[email protected]> wrote:

> +1 
> --guergana
> 
> -----Original Message-----
> From: Kaggal, Vinod C. [mailto:[email protected]] 
> Sent: Saturday, April 27, 2013 11:21 PM
> To: <[email protected]>
> Cc: <[email protected]>
> Subject: Re: roadmap for Apache cTakes "big data" processing
> 
> +1
> 
> 
> On Apr 27, 2013, at 9:05 PM, "Chen, Pei" <[email protected]> 
> wrote:
> 
>> +1 for UIMA-AS
>> 
>> 
>> On Apr 27, 2013, at 9:25 PM, "Andy McMurry" <[email protected]> wrote:
>> 
>>> I'm writing to gauge community interest and intent for parallel processing 
>>> with cTakes. 
>>> 
>>> Apache UIMA is planning "Async Scaleout" as a replacement for CPM. 
>>> http://uima.apache.org/doc-uimaas-what.html
>>> 
>>> Apache Mahout is likely to become the defacto apache package for machine 
>>> learning. 
>>> http://mahout.apache.org/
>>> 
>>> I believe cTakes will embrace both of these in due time.  
>>> Do you agree or do you have a different view? 
>>> 
>>> 
>>> 
>>> 
>>>

Re: roadmap for Apache cTakes "big data" processing

Reply via email to