Re: Use Giraph to simulate Storm ?

Jake Mannix Tue, 03 Jan 2012 09:52:37 -0800

In the current form, this is true for Giraph, but I think it's not
necessarily
*required* of the system.  Modifying mapreduce on Hadoop to become realtime
would be fundamentally wrong (single batch vs realtime), but pregel on
Hadoop
is different enough that maybe it would work (whether it *should* is
another
question):

You could imagine that maxSupersteps is Integer.MAX_VALUE, and the
system only stops when told to.  Then you could configure "listeners" on
each GiraphMapper which listen on input sources for given vertices, and
the compute() method becomes something more like

   while(notPaused()) { processNewData(poll()); }

as new data becomes available, it buffers and/or computes and/or sends
out messages.  Possibly on a periodic basis, pause() is called, and the
system completes a superstep, allowing for some global stuff to be
aggregated, etc.

In addition to being a pretty radical departure from BSP, you'd have to
make sure that the input sources are able to play nicely with the fault
tolerance of Giraph.  An input source such as an appending HDFS file
would be a good example, I guess.

  -jake

On Tue, Jan 3, 2012 at 9:33 AM, Sebastian Schelter <s...@apache.org> wrote:

> Hi Prasen,
>
> Storm is supposed to process a continuous stream of data while Giraph is
> a parallel batch processing platform. I think these are inherently
> different systems and one cannot easily be transformed into the other.
>
> -sebastian
>
> On 03.01.2012 17:51, prasenjit mukherjee wrote:
> > I have a use case which maps perfectly with the open source
> > implementation of storm ( by twitter team ). I think Giraph can be
> > easily modified to have an implementation simulating storm's use
> > cases. Just curious, if anybody had similar thoughts.
> >
> > -Prasen
>
>

Re: Use Giraph to simulate Storm ?

Reply via email to