This could be a nice possible project for GSoC... If you have time to help mentor a student, feel free to create a Jira and tag it with gsoc2013. --Pei
> -----Original Message----- > From: Andy McMurry [mailto:[email protected]] > Sent: Sunday, April 28, 2013 9:40 PM > To: [email protected] > Subject: Re: roadmap for Apache cTakes "big data" processing > > Good point Pei. > > We would need to do a spike (short sprint) in the future to see if Mahout > would be a good fit. > I'm just wondering because I'm planning out how I will be using cTakes, and > was wondering how others are planning as well. > > > Cheers, > --ANdy > > > On Apr 28, 2013, at 5:39 PM, "Chen, Pei" <[email protected]> > wrote: > > > Has anyone tried Mahout recently? > > Last time I tried, it was still closely tied to the Hadoop file system. > > > > Sent from my iPhone > > > > On Apr 28, 2013, at 7:44 PM, "Andy McMurry" > <[email protected]> wrote: > > > >> I encourage committers to checkout Apache Mahout > >> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms > >> > >> Why Apache Mahout? > >> 1. provides ML classifiers and functions not available through UIMA > >> 2. parallel by design, transparently invokes Hadoop > >> 3. Java and Apache license (every other known toolkit is GPL!) > >> 4. likely to become standard ML package for Apache > >> > >> Why would we use mahout in cTakes? > >> cTakes models are "provided", for example PoS tagging. > >> Retraining these models on your own compute cluster would be difficult > (in my opinion). > >> LibSVM is nice, but it is only one classification method. > >> > >> When ? > >> No rush, however, I suggest we dont invest time in porting SINGLE-CPU > classifier functions that we will have to parallelize, later. > >> > >> Summary: > >> UIMA + mahout = pipelines + classification > >> > >> > >> > >> > >> On Apr 28, 2013, at 4:26 PM, "Savova, Guergana" > <[email protected]> wrote: > >> > >>> +1 > >>> --guergana > >>> > >>> -----Original Message----- > >>> From: Kaggal, Vinod C. [mailto:[email protected]] > >>> Sent: Saturday, April 27, 2013 11:21 PM > >>> To: <[email protected]> > >>> Cc: <[email protected]> > >>> Subject: Re: roadmap for Apache cTakes "big data" processing > >>> > >>> +1 > >>> > >>> > >>> On Apr 27, 2013, at 9:05 PM, "Chen, Pei" > <[email protected]> wrote: > >>> > >>>> +1 for UIMA-AS > >>>> > >>>> > >>>> On Apr 27, 2013, at 9:25 PM, "Andy McMurry" > <[email protected]> wrote: > >>>> > >>>>> I'm writing to gauge community interest and intent for parallel > processing with cTakes. > >>>>> > >>>>> Apache UIMA is planning "Async Scaleout" as a replacement for CPM. > >>>>> http://uima.apache.org/doc-uimaas-what.html > >>>>> > >>>>> Apache Mahout is likely to become the defacto apache package for > machine learning. > >>>>> http://mahout.apache.org/ > >>>>> > >>>>> I believe cTakes will embrace both of these in due time. > >>>>> Do you agree or do you have a different view? > >>
