We are certainly interested in online clustering Algorithms, and clustering of timeseries seems like a great fit. (our text vectorization pipeline has not yet been reworked for the new Mahout "Samsara" but that is an interest too). What type of compute platform would you require for this?
Currently we are not looking at FPGAs. The most recent, and only real Documentation for Mahout Samsara is in Apache Mahout: Beyond MapReduce: http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html. You may want to check that out as a reference. (I'm sorry for the shameless plug but it is the only thing that cover most all Mahout "Samsara" features and architecture up to our previous release) Please do let us know if you have any questions about the Samsara platform. ________________________________________ From: Debojyoti Dutta <[email protected]> Sent: Tuesday, May 17, 2016 8:35:04 PM To: [email protected] Subject: Re: [NEW member] Hi Thanks Andy! Would like to see if there is interest for algorithms such as 1) clustering text in an online fashion (maybe using LSH or sim/min hash) or 2) online clustering of time series. Basically my focus is "online" or real time. LSH on GPU sounds very interesting and would love to look at the patches. Personally have helped accelerate LSH on TCAMs long ago e.g. http://arxiv.org/abs/1006.3514 .... Is GPU the only hw accel you are looking at or are you considering PCIe FPGA cards too? debo On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <[email protected]> wrote: > Welcome, Debojyoti. > We look forward to your contributiins. We are currently working towards > integrating GPU acceleration for our 0.13 release and LSH sounds like a > great addition. Could you tell us some more about what you would like to do? > > Let us know if we can help you get familiar with the mahout code base. We > try to implement algorithms in the math-scala module. > > Thanks, > > Andy > > > > > > -------- Original message -------- > From: Debojyoti Dutta <[email protected]> > Date: 05/17/2016 8:11 PM (GMT-05:00) > To: [email protected] > Subject: [NEW member] Hi > > Hi there, > > Am very interested in contributing to Mahout especially towards fast ML > kernels that can be used for streaming. Have some experience with LSH based > techniques (including hw accel) for clustering and near neighbors based > stuff in general. > > Was chatting with Sunil and he suggested I join the merry band. > > regards > -Debo~ > -- -Debo~
