I will be using giraph/hadoop for other use cases anyways, and I don't
want to install/maintain Storm just for the real-time streaming use
I am also thinking of adding real-time logs to hbase and have giraph
pick up the incremental feeds from hbase based on time stamp.
On 1/4/12, Avery Ching <ach...@apache.org> wrote:
> Interesting idea. You could actually implement the code to load the new
> input data in preSuperstep(). If the input data is resilient (i.e.
> stored on HDFS), then the system would inherit Giraph's reliability
> guarantees. Implementing an external trigger to stop the application
> wouldn't be too difficult, (i.e. dump a file stamp or something and
> check for it every n supersteps). Still, as I'm not that familiar with
> Storm, what would be the advantages of this over Storm?
> On 1/3/12 5:30 PM, prasenjit mukherjee wrote:
>> As Jake mentioned, you can have continous processing by making the
>> mappers in Giraph stop based on an external condition ( I.e.
>> Specifically asked to do so ) and one can call voteForHalt() only if
>> that condition is satisfied.
>> Additionally, the VertexInputSource can be modified to read it from a
>> continuous input ( like ActiveMQ or even a port ) potentially outside
>> of HDFS.
>> On 1/3/12, Sebastian Schelter<s...@apache.org> wrote:
>>> Hi Prasen,
>>> Storm is supposed to process a continuous stream of data while Giraph is
>>> a parallel batch processing platform. I think these are inherently
>>> different systems and one cannot easily be transformed into the other.
>>> On 03.01.2012 17:51, prasenjit mukherjee wrote:
>>>> I have a use case which maps perfectly with the open source
>>>> implementation of storm ( by twitter team ). I think Giraph can be
>>>> easily modified to have an implementation simulating storm's use
>>>> cases. Just curious, if anybody had similar thoughts.
Sent from my mobile device