To the community, active committers, etc.     


> On Jun 1, 2016, at 11:01 AM, Suneel Marthi <[email protected]> wrote:
> 
> Was that question directed to the community or were u asking urself loud ?
> 
> On Wed, Jun 1, 2016 at 10:48 AM, Khurrum Nasim <[email protected]>
> wrote:
> 
>> How are you folks getting over the learning curves associated with things
>> like Nifi and AirFlow ?
>> 
>>> On May 28, 2016, at 9:50 AM, Suneel Marthi <[email protected]> wrote:
>>> 
>>> Debo,
>>> 
>>> On Tue, May 17, 2016 at 9:18 PM, Andrew Palumbo <[email protected]>
>> wrote:
>>> 
>>>> We are certainly interested in  online clustering Algorithms, and
>>>> clustering of timeseries seems like a great fit.  (our text
>> vectorization
>>>> pipeline has not yet been reworked for the new Mahout "Samsara" but
>> that is
>>>> an interest too).  What type of compute platform would you require for
>> this?
>>>> 
>>> 
>>> For data processing pipeline, the requirements are :
>>>   (A) it should be agnostic to any distributed processing engine like
>>> Spark, Flink, etc.
>>>   (b) should be able to scale data pipelines and be able to support back
>>> pressure.
>>>   (c) should be able to ingest both Batch and Streaming data from Spark,
>>> Flink, Beam etc...
>>> 
>>>  So far Apache NiFi seems to fit the bill for all of the above criteria
>>> (they don't have a Beam interface yet but is being worked on) and they
>> also
>>> have an excellent GUI along with features to define common workflow
>>> templates that could be imported into custom workflows.
>>> 
>>> The other alternatives being considered are Airbnb's Airflow - proposed
>> for
>>> Apache incubator and defines workflows as a DAG in python,
>>> Apache Beam.
>>> 
>>> 
>>> 
>>>> 
>>>> Currently we are not looking at FPGAs.
>>>> 
>>> 
>>> If any of the Math packages handle FPGAs natively out-of-the-box, let's
>> go
>>> for it. But we need not optimize the heck to get the last bit of
>>> performance from FPGAs.
>>> 
>>> 
>>>> 
>>>> The most recent, and only real Documentation for Mahout Samsara is in
>>>> Apache Mahout: Beyond MapReduce:
>>>> 
>>>> 
>>>> 
>> http://www.weatheringthroughtechdays.com/2016/02/mahout-samsara-book-is-out.html
>> .
>>>> You may want to check that out as a reference.
>>>> 
>>>> (I'm sorry for the shameless plug but it is the only thing that cover
>> most
>>>> all Mahout "Samsara" features and architecture up to our previous
>> release)
>>>> 
>>> 
>>> I don't see this as a shameless plug, its definitely much better than the
>>> dozen low grade books that have been churned out by PackT publishers and
>>> went nowhere, other than bringing disrepute to the project and community.
>>> 
>>> 
>>>> 
>>>> Please do let us know if you have any questions about the Samsara
>> platform.
>>>> ________________________________________
>>>> From: Debojyoti Dutta <[email protected]>
>>>> Sent: Tuesday, May 17, 2016 8:35:04 PM
>>>> To: [email protected]
>>>> Subject: Re: [NEW member] Hi
>>>> 
>>>> Thanks Andy! Would like to see if there is interest for algorithms such
>> as
>>>> 1) clustering text in an online fashion (maybe using LSH or sim/min
>> hash)
>>>> or 2) online clustering of time series. Basically my focus is "online"
>> or
>>>> real time.
>>>> 
>>>> LSH on GPU sounds very interesting and would love to look at the
>> patches.
>>>> Personally have helped accelerate LSH on TCAMs long ago e.g.
>>>> http://arxiv.org/abs/1006.3514 .... Is GPU the only hw accel you are
>>>> looking at or are you considering PCIe FPGA cards too?
>>>> 
>>>> debo
>>>> 
>>>> On Tue, May 17, 2016 at 5:27 PM, Andrew Palumbo <[email protected]>
>>>> wrote:
>>>> 
>>>>> Welcome, Debojyoti.
>>>>> We look forward to your contributiins.  We are currently working
>> towards
>>>>> integrating GPU acceleration for our 0.13 release and LSH sounds like a
>>>>> great addition. Could you tell us some more about what you would like
>> to
>>>> do?
>>>>> 
>>>>> Let us know if we can help you get familiar with the mahout code base.
>>>> We
>>>>> try to implement algorithms in the math-scala module.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Andy
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -------- Original message --------
>>>>> From: Debojyoti Dutta <[email protected]>
>>>>> Date: 05/17/2016 8:11 PM (GMT-05:00)
>>>>> To: [email protected]
>>>>> Subject: [NEW member] Hi
>>>>> 
>>>>> Hi there,
>>>>> 
>>>>> Am very interested in contributing to Mahout especially towards fast ML
>>>>> kernels that can be used for streaming. Have some experience with LSH
>>>> based
>>>>> techniques (including hw accel) for clustering and near neighbors based
>>>>> stuff in general.
>>>>> 
>>>>> Was chatting with Sunil and he suggested I join the merry band.
>>>>> 
>>>>> regards
>>>>> -Debo~
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> -Debo~
>>>> 
>> 
>> 

Reply via email to