Was that question directed to the community or were u asking urself loud ?

On Wed, Jun 1, 2016 at 10:48 AM, Khurrum Nasim <[email protected]>
wrote:

> How are you folks getting over the learning curves associated with things
> like Nifi and AirFlow ?
>
> > On May 28, 2016, at 9:50 AM, Suneel Marthi <[email protected]> wrote:
> >
> > Debo,
> >
> > On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <[email protected]>
> wrote:
> >
> >> We are certainly interested in  online clustering Algorithms, and
> >> clustering of timeseries seems like a great fit.  (our text
> vectorization
> >> pipeline has not yet been reworked for the new Mahout "Samsara" but
> that is
> >> an interest too).  What type of compute platform would you require for
> this?
> >>
> >
> > For data processing pipeline, the requirements are :
> >    (A) it should be agnostic to any distributed processing engine like
> > Spark, Flink, etc.
> >    (b) should be able to scale data pipelines and be able to support back
> > pressure.
> >    (c) should be able to ingest both Batch and Streaming data from Spark,
> > Flink, Beam etc...
> >
> >   So far Apache NiFi seems to fit the bill for all of the above criteria
> > (they don't have a Beam interface yet but is being worked on) and they
> also
> > have an excellent GUI along with features to define common workflow
> > templates that could be imported into custom workflows.
> >
> > The other alternatives being considered are Airbnb's Airflow - proposed
> for
> > Apache incubator and defines workflows as a DAG in python,
> > Apache Beam.
> >
> >
> >
> >>
> >> Currently we are not looking at FPGAs.
> >>
> >
> > If any of the Math packages handle FPGAs natively out-of-the-box, let's
> go
> > for it. But we need not optimize the heck to get the last bit of
> > performance from FPGAs.
> >
> >
> >>
> >> The most recent, and only real Documentation for Mahout Samsara is in
> >> Apache Mahout: Beyond MapReduce:
> >>
> >>
> >>
> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html
> .
> >> You may want to check that out as a reference.
> >>
> >> (I'm sorry for the shameless plug but it is the only thing that cover
> most
> >> all Mahout "Samsara" features and architecture up to our previous
> release)
> >>
> >
> > I don't see this as a shameless plug, its definitely much better than the
> > dozen low grade books that have been churned out by PackT publishers and
> > went nowhere, other than bringing disrepute to the project and community.
> >
> >
> >>
> >> Please do let us know if you have any questions about the Samsara
> platform.
> >> ________________________________________
> >> From: Debojyoti Dutta <[email protected]>
> >> Sent: Tuesday, May 17, 2016 8:35:04 PM
> >> To: [email protected]
> >> Subject: Re: [NEW member] Hi
> >>
> >> Thanks Andy! Would like to see if there is interest for algorithms such
> as
> >> 1) clustering text in an online fashion (maybe using LSH or sim/min
> hash)
> >> or 2) online clustering of time series. Basically my focus is "online"
> or
> >> real time.
> >>
> >> LSH on GPU sounds very interesting and would love to look at the
> patches.
> >> Personally have helped accelerate LSH on TCAMs long ago e.g.
> >> http://arxiv.org/abs/1006.3514 .... Is GPU the only hw accel you are
> >> looking at or are you considering PCIe FPGA cards too?
> >>
> >> debo
> >>
> >> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <[email protected]>
> >> wrote:
> >>
> >>> Welcome, Debojyoti.
> >>> We look forward to your contributiins.  We are currently working
> towards
> >>> integrating GPU acceleration for our 0.13 release and LSH sounds like a
> >>> great addition. Could you tell us some more about what you would like
> to
> >> do?
> >>>
> >>> Let us know if we can help you get familiar with the mahout code base.
> >> We
> >>> try to implement algorithms in the math-scala module.
> >>>
> >>> Thanks,
> >>>
> >>> Andy
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -------- Original message --------
> >>> From: Debojyoti Dutta <[email protected]>
> >>> Date: 05/17/2016 8:11 PM (GMT-05:00)
> >>> To: [email protected]
> >>> Subject: [NEW member] Hi
> >>>
> >>> Hi there,
> >>>
> >>> Am very interested in contributing to Mahout especially towards fast ML
> >>> kernels that can be used for streaming. Have some experience with LSH
> >> based
> >>> techniques (including hw accel) for clustering and near neighbors based
> >>> stuff in general.
> >>>
> >>> Was chatting with Sunil and he suggested I join the merry band.
> >>>
> >>> regards
> >>> -Debo~
> >>>
> >>
> >>
> >>
> >> --
> >> -Debo~
> >>
>
>

Reply via email to