Re: [hpx-users] GSoC 2016 [Implement a Map/Reduce Framework]

Aalekh Nigam Thu, 10 Mar 2016 05:27:13 -0800

Hello Sir,


> > 2) Along with Dataflow model, I would also borrow some of the features
> > from MillWheel [1] and FlumeJava [2] (features such as Fault-tolerance,
> > running efficient data parallel pipelines, etc).
>
> Perfect. Do you have something more concrete in mind? Any use cases?
> Design ideas?
>

 Following is my brief design idea that incorporates features from both
FlumeJava and MillWheel:

Let us assume each steps in stream processing to be a directed acyclic
graph with output corresponded by directed edges, now for stream processing
each output would have three parameters, i.e; key, value, timestamp (Key
here refers to as processing request, value as its output and timestamp the
time we received request) . Assuming each processFunction(user defined) to
be node of the directed acyclic graph, we would send back the
ACK(acknowledgment) signal once the data in i+1- th node is received from i
- th node (this property ensures data is not lost in the process). Data on
a given node is stored in a std::map<uint, std::string>. A point worth
,mentioning here is that each map would exist for a defined time period
(few millisecond) in case ACK is not received for that time period, map for
that specific node will be cleared. Now in case there are many parallel
data pipelines directed edges of final output from each each pipeline will
be concatenated using "join()" function and then further processed by the
resulting function (analogous to reduce).
A point worth to mention here is that instead of ACK model, we can also
consider Uber's RingPOP RPC model here:
https://eng.uber.com/intro-to-ringpop/

PS:There is a Data-flow based framework under development known as Apache
Beam: https://github.com/apache/incubator-beam, and can be looked  for
inspiration.

>
> We definitely will be here to discuss things as you start putting out
> ideas, questions, suggestions, etc. I think you have already started
> looking at HPX itself, if not - it might be a good time to start doing so.
>

Along with adding Framework for Dataflow/Map-Reduce to HPX i also plan to
have a pluggable storage and cache interface provided so that  framework
user can store data into various storage system like(BigTable, Cassandra,
etc), for pluggable cache interface user can store data into any cache
system available (Redis/Memcache). PS: there won't be external plugin
needed for  both these functions just two unix port for communications.

Thanks,
Aalekh Nigam
https://in.linkedin.com/in/aalekh-nigam-a7962064

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Re: [hpx-users] GSoC 2016 [Implement a Map/Reduce Framework]

Reply via email to