Guys,

As you know, the trend already began to drift towards focusing on realtime and 
streaming instead of batch. To support streaming graph and incremental learning 
in Hama, I recently began a full-scale investigation about streaming data 
processing[1] and multi-BSP job pipelines[2].

Basically, the problem is how to process the unstructured input stream and 
transfer its output stream to the next "advanced streaming analytics" job 
without overheads. In here, there's also tricky issue in determining where 
should "new data" and "updates" be delivered. Some uses shared memory or only 
supports micro-batch algorithms, but we can efficiently and directly solve this 
problem by message-passing between multi jobs.

1. https://issues.apache.org/jira/browse/HAMA-883
2. https://issues.apache.org/jira/browse/HAMA-901

Reply via email to