Hello Sir,
> > 2) Along with Dataflow model, I would also borrow some of the features > > from MillWheel [1] and FlumeJava [2] (features such as Fault-tolerance, > > running efficient data parallel pipelines, etc). > > Perfect. Do you have something more concrete in mind? Any use cases? > Design ideas? > Following is my brief design idea that incorporates features from both FlumeJava and MillWheel: Let us assume each steps in stream processing to be a directed acyclic graph with output corresponded by directed edges, now for stream processing each output would have three parameters, i.e; key, value, timestamp (Key here refers to as processing request, value as its output and timestamp the time we received request) . Assuming each processFunction(user defined) to be node of the directed acyclic graph, we would send back the ACK(acknowledgment) signal once the data in i+1- th node is received from i - th node (this property ensures data is not lost in the process). Data on a given node is stored in a std::map<uint, std::string>. A point worth ,mentioning here is that each map would exist for a defined time period (few millisecond) in case ACK is not received for that time period, map for that specific node will be cleared. Now in case there are many parallel data pipelines directed edges of final output from each each pipeline will be concatenated using "join()" function and then further processed by the resulting function (analogous to reduce). A point worth to mention here is that instead of ACK model, we can also consider Uber's RingPOP RPC model here: https://eng.uber.com/intro-to-ringpop/ PS:There is a Data-flow based framework under development known as Apache Beam: https://github.com/apache/incubator-beam, and can be looked for inspiration. > > We definitely will be here to discuss things as you start putting out > ideas, questions, suggestions, etc. I think you have already started > looking at HPX itself, if not - it might be a good time to start doing so. > Along with adding Framework for Dataflow/Map-Reduce to HPX i also plan to have a pluggable storage and cache interface provided so that framework user can store data into various storage system like(BigTable, Cassandra, etc), for pluggable cache interface user can store data into any cache system available (Redis/Memcache). PS: there won't be external plugin needed for both these functions just two unix port for communications. Thanks, Aalekh Nigam https://in.linkedin.com/in/aalekh-nigam-a7962064
_______________________________________________ hpx-users mailing list [email protected] https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
