Some replies inline to the issues you outlined.
>1) Giraph runs completely as a MapReduce job on Hadoop today. This needs
to be maintained to support our current users, who will not likely move to
MRv2 for at least a year.
I think what you need is to support Giraph's graph API for your users, but
no, not the underlying implementation. (Or are you leaking MapReduce APIs to
your users?) Sure, you are restricted to the under implementation(Hadoop
MRV1 or MRV2 whenever it gets used) at any point of time, but what we are
discussing is _that_ future when the underlying implementation itself also
moves to MRV2.
>2) The internals of Giraph are implemented differently than Hama..
Sure, but only at present. My original question is - given a BSP
implementation on a YARN cluster, can GiraphV2(BSP based) be simply
implemented over that or not. If today, GiraphV1 uses (its own) BSP
implementation over mapreduce APIs on Hadoop MRV1 cluster, I can clearly see
how GiraphV2 can be using (HAMA's) BSP implemented over YARN APIs.
>3) If we have various graph processing computing models (BSP based,
streams or asynchronous, or a combination), then being on Hama brings little
value for Giraph.
That future isn't there yet. In any case, I'd bet when you get there, lot of
what you have now also wouldn't be an out-of-the-box fit.
>From my perspective (a third person POV), this is what I can conclude.
Giraph's velocity on Hadoop MapReduce may be real the impedence for thinking
about a possible sharing of the bsp based implementation with HAMAV2. Sure,
Giraph has other ideas regarding the computation model itself, but that is a
future that isn't here yet.
I just hope the same velocity isn't an impedance for thinking about the
next-gen version on top of YARN :) The way I see it, porting Giraph to YARN
is also a revolution in itself; most, if not all, of the implementation will
change yet with the API level compatibility. I am still eagerly looking
forward to the port of Giraph to YARN. May be more digging into Giraph
internals may help my cause too.
If nothing, this discussion atleast helped sharing of some of the ideas
between the two communities.
Thanks all for putting down in your thoughts.
On Wed, Sep 14, 2011 at 11:46 AM, Thomas Jungblut <
> We are also thinking about other underlying computing models (i.e.
>> streaming (asynchronous) graph processing - see
> That is a really cool idea. But I don't think we are going to focus solely
> on graph computing. We want to enable an interface which can be used for it
> (straight forward as described in the Pregel Paper), but I think you are
> really graph experts- so we don't want to compete with each other :D
> Our asynchronous processing (in my opinion) will just enable the sending of
> messages within the computation phase. So the BarrierSync is just a little
> transition to make sure every task is ready and every message has been send.
> Your Vertex locking is a graph-only feature, this won't be effecting us
> Giraph runs completely as a MapReduce job on Hadoop today.
> I think our result is the following:
> We (Apache Hama) are focussing on the YARN implementation of the BSP
> If you want to run Giraph on a real BSP engine later, feel free to put your
> stuff on top of that.
> As far as I have seen, there is a 100% backward compatibility of YARN, so
> your current solution will run on YARN either.
> Best Regards,