Was that question directed to the community or were u asking urself loud ? On Wed, Jun 1, 2016 at 10:48 AM, Khurrum Nasim <[email protected]> wrote:
> How are you folks getting over the learning curves associated with things > like Nifi and AirFlow ? > > > On May 28, 2016, at 9:50 AM, Suneel Marthi <[email protected]> wrote: > > > > Debo, > > > > On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <[email protected]> > wrote: > > > >> We are certainly interested in online clustering Algorithms, and > >> clustering of timeseries seems like a great fit. (our text > vectorization > >> pipeline has not yet been reworked for the new Mahout "Samsara" but > that is > >> an interest too). What type of compute platform would you require for > this? > >> > > > > For data processing pipeline, the requirements are : > > (A) it should be agnostic to any distributed processing engine like > > Spark, Flink, etc. > > (b) should be able to scale data pipelines and be able to support back > > pressure. > > (c) should be able to ingest both Batch and Streaming data from Spark, > > Flink, Beam etc... > > > > So far Apache NiFi seems to fit the bill for all of the above criteria > > (they don't have a Beam interface yet but is being worked on) and they > also > > have an excellent GUI along with features to define common workflow > > templates that could be imported into custom workflows. > > > > The other alternatives being considered are Airbnb's Airflow - proposed > for > > Apache incubator and defines workflows as a DAG in python, > > Apache Beam. > > > > > > > >> > >> Currently we are not looking at FPGAs. > >> > > > > If any of the Math packages handle FPGAs natively out-of-the-box, let's > go > > for it. But we need not optimize the heck to get the last bit of > > performance from FPGAs. > > > > > >> > >> The most recent, and only real Documentation for Mahout Samsara is in > >> Apache Mahout: Beyond MapReduce: > >> > >> > >> > http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html > . > >> You may want to check that out as a reference. > >> > >> (I'm sorry for the shameless plug but it is the only thing that cover > most > >> all Mahout "Samsara" features and architecture up to our previous > release) > >> > > > > I don't see this as a shameless plug, its definitely much better than the > > dozen low grade books that have been churned out by PackT publishers and > > went nowhere, other than bringing disrepute to the project and community. > > > > > >> > >> Please do let us know if you have any questions about the Samsara > platform. > >> ________________________________________ > >> From: Debojyoti Dutta <[email protected]> > >> Sent: Tuesday, May 17, 2016 8:35:04 PM > >> To: [email protected] > >> Subject: Re: [NEW member] Hi > >> > >> Thanks Andy! Would like to see if there is interest for algorithms such > as > >> 1) clustering text in an online fashion (maybe using LSH or sim/min > hash) > >> or 2) online clustering of time series. Basically my focus is "online" > or > >> real time. > >> > >> LSH on GPU sounds very interesting and would love to look at the > patches. > >> Personally have helped accelerate LSH on TCAMs long ago e.g. > >> http://arxiv.org/abs/1006.3514 .... Is GPU the only hw accel you are > >> looking at or are you considering PCIe FPGA cards too? > >> > >> debo > >> > >> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <[email protected]> > >> wrote: > >> > >>> Welcome, Debojyoti. > >>> We look forward to your contributiins. We are currently working > towards > >>> integrating GPU acceleration for our 0.13 release and LSH sounds like a > >>> great addition. Could you tell us some more about what you would like > to > >> do? > >>> > >>> Let us know if we can help you get familiar with the mahout code base. > >> We > >>> try to implement algorithms in the math-scala module. > >>> > >>> Thanks, > >>> > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> -------- Original message -------- > >>> From: Debojyoti Dutta <[email protected]> > >>> Date: 05/17/2016 8:11 PM (GMT-05:00) > >>> To: [email protected] > >>> Subject: [NEW member] Hi > >>> > >>> Hi there, > >>> > >>> Am very interested in contributing to Mahout especially towards fast ML > >>> kernels that can be used for streaming. Have some experience with LSH > >> based > >>> techniques (including hw accel) for clustering and near neighbors based > >>> stuff in general. > >>> > >>> Was chatting with Sunil and he suggested I join the merry band. > >>> > >>> regards > >>> -Debo~ > >>> > >> > >> > >> > >> -- > >> -Debo~ > >> > >
