Guys, As you know, the trend already began to drift towards focusing on realtime and streaming instead of batch. To support streaming graph and incremental learning in Hama, I recently began a full-scale investigation about streaming data processing[1] and multi-BSP job pipelines[2].
Basically, the problem is how to process the unstructured input stream and transfer its output stream to the next "advanced streaming analytics" job without overheads. In here, there's also tricky issue in determining where should "new data" and "updates" be delivered. Some uses shared memory or only supports micro-batch algorithms, but we can efficiently and directly solve this problem by message-passing between multi jobs. 1. https://issues.apache.org/jira/browse/HAMA-883 2. https://issues.apache.org/jira/browse/HAMA-901
