Good point Pei. We would need to do a spike (short sprint) in the future to see if Mahout would be a good fit. I'm just wondering because I'm planning out how I will be using cTakes, and was wondering how others are planning as well.
Cheers, --ANdy On Apr 28, 2013, at 5:39 PM, "Chen, Pei" <[email protected]> wrote: > Has anyone tried Mahout recently? > Last time I tried, it was still closely tied to the Hadoop file system. > > Sent from my iPhone > > On Apr 28, 2013, at 7:44 PM, "Andy McMurry" <[email protected]> wrote: > >> I encourage committers to checkout Apache Mahout >> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms >> >> Why Apache Mahout? >> 1. provides ML classifiers and functions not available through UIMA >> 2. parallel by design, transparently invokes Hadoop >> 3. Java and Apache license (every other known toolkit is GPL!) >> 4. likely to become standard ML package for Apache >> >> Why would we use mahout in cTakes? >> cTakes models are "provided", for example PoS tagging. >> Retraining these models on your own compute cluster would be difficult (in >> my opinion). >> LibSVM is nice, but it is only one classification method. >> >> When ? >> No rush, however, I suggest we dont invest time in porting SINGLE-CPU >> classifier functions that we will have to parallelize, later. >> >> Summary: >> UIMA + mahout = pipelines + classification >> >> >> >> >> On Apr 28, 2013, at 4:26 PM, "Savova, Guergana" >> <[email protected]> wrote: >> >>> +1 >>> --guergana >>> >>> -----Original Message----- >>> From: Kaggal, Vinod C. [mailto:[email protected]] >>> Sent: Saturday, April 27, 2013 11:21 PM >>> To: <[email protected]> >>> Cc: <[email protected]> >>> Subject: Re: roadmap for Apache cTakes "big data" processing >>> >>> +1 >>> >>> >>> On Apr 27, 2013, at 9:05 PM, "Chen, Pei" <[email protected]> >>> wrote: >>> >>>> +1 for UIMA-AS >>>> >>>> >>>> On Apr 27, 2013, at 9:25 PM, "Andy McMurry" <[email protected]> wrote: >>>> >>>>> I'm writing to gauge community interest and intent for parallel >>>>> processing with cTakes. >>>>> >>>>> Apache UIMA is planning "Async Scaleout" as a replacement for CPM. >>>>> http://uima.apache.org/doc-uimaas-what.html >>>>> >>>>> Apache Mahout is likely to become the defacto apache package for machine >>>>> learning. >>>>> http://mahout.apache.org/ >>>>> >>>>> I believe cTakes will embrace both of these in due time. >>>>> Do you agree or do you have a different view? >>
