Interesting idea. You could actually implement the code to load the new input data in preSuperstep(). If the input data is resilient (i.e. stored on HDFS), then the system would inherit Giraph's reliability guarantees. Implementing an external trigger to stop the application wouldn't be too difficult, (i.e. dump a file stamp or something and check for it every n supersteps). Still, as I'm not that familiar with Storm, what would be the advantages of this over Storm?


On 1/3/12 5:30 PM, prasenjit mukherjee wrote:
As Jake mentioned, you can have continous processing by making the
mappers in Giraph stop based on an external condition ( I.e.
Specifically asked to do so ) and one can call voteForHalt() only if
that condition is satisfied.

Additionally, the VertexInputSource can be modified to read it from a
continuous input ( like ActiveMQ or even a port ) potentially outside
of HDFS.

On 1/3/12, Sebastian Schelter<>  wrote:
Hi Prasen,

Storm is supposed to process a continuous stream of data while Giraph is
a parallel batch processing platform. I think these are inherently
different systems and one cannot easily be transformed into the other.


On 03.01.2012 17:51, prasenjit mukherjee wrote:
I have a use case which maps perfectly with the open source
implementation of storm ( by twitter team ). I think Giraph can be
easily modified to have an implementation simulating storm's use
cases. Just curious, if anybody had similar thoughts.


Reply via email to