In the current form, this is true for Giraph, but I think it's not necessarily *required* of the system. Modifying mapreduce on Hadoop to become realtime would be fundamentally wrong (single batch vs realtime), but pregel on Hadoop is different enough that maybe it would work (whether it *should* is another question):
You could imagine that maxSupersteps is Integer.MAX_VALUE, and the system only stops when told to. Then you could configure "listeners" on each GiraphMapper which listen on input sources for given vertices, and the compute() method becomes something more like while(notPaused()) { processNewData(poll()); } as new data becomes available, it buffers and/or computes and/or sends out messages. Possibly on a periodic basis, pause() is called, and the system completes a superstep, allowing for some global stuff to be aggregated, etc. In addition to being a pretty radical departure from BSP, you'd have to make sure that the input sources are able to play nicely with the fault tolerance of Giraph. An input source such as an appending HDFS file would be a good example, I guess. -jake On Tue, Jan 3, 2012 at 9:33 AM, Sebastian Schelter <s...@apache.org> wrote: > Hi Prasen, > > Storm is supposed to process a continuous stream of data while Giraph is > a parallel batch processing platform. I think these are inherently > different systems and one cannot easily be transformed into the other. > > -sebastian > > On 03.01.2012 17:51, prasenjit mukherjee wrote: > > I have a use case which maps perfectly with the open source > > implementation of storm ( by twitter team ). I think Giraph can be > > easily modified to have an implementation simulating storm's use > > cases. Just curious, if anybody had similar thoughts. > > > > -Prasen > >