To the community, active committers, etc.
> On Jun 1, 2016, at 11:01 AM, Suneel Marthi <[email protected]> wrote: > > Was that question directed to the community or were u asking urself loud ? > > On Wed, Jun 1, 2016 at 10:48 AM, Khurrum Nasim <[email protected]> > wrote: > >> How are you folks getting over the learning curves associated with things >> like Nifi and AirFlow ? >> >>> On May 28, 2016, at 9:50 AM, Suneel Marthi <[email protected]> wrote: >>> >>> Debo, >>> >>> On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <[email protected]> >> wrote: >>> >>>> We are certainly interested in online clustering Algorithms, and >>>> clustering of timeseries seems like a great fit. (our text >> vectorization >>>> pipeline has not yet been reworked for the new Mahout "Samsara" but >> that is >>>> an interest too). What type of compute platform would you require for >> this? >>>> >>> >>> For data processing pipeline, the requirements are : >>> (A) it should be agnostic to any distributed processing engine like >>> Spark, Flink, etc. >>> (b) should be able to scale data pipelines and be able to support back >>> pressure. >>> (c) should be able to ingest both Batch and Streaming data from Spark, >>> Flink, Beam etc... >>> >>> So far Apache NiFi seems to fit the bill for all of the above criteria >>> (they don't have a Beam interface yet but is being worked on) and they >> also >>> have an excellent GUI along with features to define common workflow >>> templates that could be imported into custom workflows. >>> >>> The other alternatives being considered are Airbnb's Airflow - proposed >> for >>> Apache incubator and defines workflows as a DAG in python, >>> Apache Beam. >>> >>> >>> >>>> >>>> Currently we are not looking at FPGAs. >>>> >>> >>> If any of the Math packages handle FPGAs natively out-of-the-box, let's >> go >>> for it. But we need not optimize the heck to get the last bit of >>> performance from FPGAs. >>> >>> >>>> >>>> The most recent, and only real Documentation for Mahout Samsara is in >>>> Apache Mahout: Beyond MapReduce: >>>> >>>> >>>> >> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html >> . >>>> You may want to check that out as a reference. >>>> >>>> (I'm sorry for the shameless plug but it is the only thing that cover >> most >>>> all Mahout "Samsara" features and architecture up to our previous >> release) >>>> >>> >>> I don't see this as a shameless plug, its definitely much better than the >>> dozen low grade books that have been churned out by PackT publishers and >>> went nowhere, other than bringing disrepute to the project and community. >>> >>> >>>> >>>> Please do let us know if you have any questions about the Samsara >> platform. >>>> ________________________________________ >>>> From: Debojyoti Dutta <[email protected]> >>>> Sent: Tuesday, May 17, 2016 8:35:04 PM >>>> To: [email protected] >>>> Subject: Re: [NEW member] Hi >>>> >>>> Thanks Andy! Would like to see if there is interest for algorithms such >> as >>>> 1) clustering text in an online fashion (maybe using LSH or sim/min >> hash) >>>> or 2) online clustering of time series. Basically my focus is "online" >> or >>>> real time. >>>> >>>> LSH on GPU sounds very interesting and would love to look at the >> patches. >>>> Personally have helped accelerate LSH on TCAMs long ago e.g. >>>> http://arxiv.org/abs/1006.3514 .... Is GPU the only hw accel you are >>>> looking at or are you considering PCIe FPGA cards too? >>>> >>>> debo >>>> >>>> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <[email protected]> >>>> wrote: >>>> >>>>> Welcome, Debojyoti. >>>>> We look forward to your contributiins. We are currently working >> towards >>>>> integrating GPU acceleration for our 0.13 release and LSH sounds like a >>>>> great addition. Could you tell us some more about what you would like >> to >>>> do? >>>>> >>>>> Let us know if we can help you get familiar with the mahout code base. >>>> We >>>>> try to implement algorithms in the math-scala module. >>>>> >>>>> Thanks, >>>>> >>>>> Andy >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -------- Original message -------- >>>>> From: Debojyoti Dutta <[email protected]> >>>>> Date: 05/17/2016 8:11 PM (GMT-05:00) >>>>> To: [email protected] >>>>> Subject: [NEW member] Hi >>>>> >>>>> Hi there, >>>>> >>>>> Am very interested in contributing to Mahout especially towards fast ML >>>>> kernels that can be used for streaming. Have some experience with LSH >>>> based >>>>> techniques (including hw accel) for clustering and near neighbors based >>>>> stuff in general. >>>>> >>>>> Was chatting with Sunil and he suggested I join the merry band. >>>>> >>>>> regards >>>>> -Debo~ >>>>> >>>> >>>> >>>> >>>> -- >>>> -Debo~ >>>> >> >>
