Hi Folks,

I was thinking about how to drive this initiative and had some ideas around 
execution, would love some feedback:

1) While the discussion is happening around the design I was thinking of 
building a little prototype with one of the algorithms , the prototype will be 
a first cut representation of the design where we represent one algorithm into 
a storm topology, when I look at the list of algorithms that we're thinking 
about bringing over from samoa 
(https://samoa.incubator.apache.org/documentation/SAMOA-and-Machine-Learning.html)
 the distributed stream clustering looks the most valuable for a prototype, 
thoughts

Apache SAMOA and Machine 
Learning<https://samoa.incubator.apache.org/documentation/SAMOA-and-Machine-Learning.html>
samoa.incubator.apache.org
Apache SAMOA and Machine Learning. SAMOA’s main goal is to help developers to 
create easily machine learning algorithms on top of any distributed stream 
processing engine.


2) I would like to leverage some of the ideas in MichaelAngelo as well as my 
previous experience in building a tool that versions, deploys and associates ML 
models with newly arriving windows of data, in actuality I feel like this is a 
completely orthogonal initiative that we also need to design out, should this 
be part of the design doc at this point, thoughts?

3) Should we address security in streaming machine learning models for the 
first release?

4) The design doc mentions a GenericMLOutputModelSink, I was thinking this is 
like a factory method in that has underlying representations of various sinks 
that already exist that I'm hoping to leverage, see here: 
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_storm-component-guide/content/ch_storm-connectors.html



@Karthik Ramasamy<mailto:[email protected]> et all, would love to get thoughts 
on how we proceed with this initiative at this point, in the meantime I will 
get started with 1 to test out the feasibility of this design.

Regards

Chapter 5. Moving Data Into and Out of Apache Storm Using 
...<https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_storm-component-guide/content/ch_storm-connectors.html>
docs.hortonworks.com
This chapter focuses on moving data into and out of Apache Storm through the 
use of spouts and bolts. Spouts read data from external sources to ingest data 
into a topology.






________________________________
From: Saikat Kanjilal <[email protected]>
Sent: Monday, May 7, 2018 2:31 PM
To: [email protected]
Subject: [DISCUSS] A design proposal for incorporating machine learning 
algorithms into heron


Hello Dev community,

I have created the initial API design documentation around building storm 
topologies around a set of machine learning streaming algorithms here: 
https://docs.google.com/document/d/1LrO7XRcMxJoMM83wjRd-Ov74VAaomA_mXOAhCStgGng/edit?usp=sharing,
 this is very much a work in progress but I wanted to start getting early  
feedback from the community as its a lot of complex operations representing a 
streaming ml pipeline using heron.   This design leverages apache samoa to 
figure out which algorithms to focus on in bringing into heron.

Thank you Karthik Ramasamy for your mentoring on this, the goal will be to 
represent all the algorithms in phase 1 as storm topologies and then to evolve 
this to building a streamlet based architecture would really appreciate some 
feedback from the community

While you guys are commenting on the initial approach I will : 1) finish the 
design for the rest of the algorithms for phase 1 2) start the design for 
building out a heron streamlet based architecture to run on top of the storm 
based topologies.

Look forward to a productive discussion around the design

Reply via email to