Avery, Some replies inline to the issues you outlined.
>1) Giraph runs completely as a MapReduce job on Hadoop today. This needs to be maintained to support our current users, who will not likely move to MRv2 for at least a year. I think what you need is to support Giraph's graph API for your users, but no, not the underlying implementation. (Or are you leaking MapReduce APIs to your users?) Sure, you are restricted to the under implementation(Hadoop MRV1 or MRV2 whenever it gets used) at any point of time, but what we are discussing is _that_ future when the underlying implementation itself also moves to MRV2. >2) The internals of Giraph are implemented differently than Hama.. Sure, but only at present. My original question is - given a BSP implementation on a YARN cluster, can GiraphV2(BSP based) be simply implemented over that or not. If today, GiraphV1 uses (its own) BSP implementation over mapreduce APIs on Hadoop MRV1 cluster, I can clearly see how GiraphV2 can be using (HAMA's) BSP implemented over YARN APIs. >3) If we have various graph processing computing models (BSP based, streams or asynchronous, or a combination), then being on Hama brings little value for Giraph. That future isn't there yet. In any case, I'd bet when you get there, lot of what you have now also wouldn't be an out-of-the-box fit. >From my perspective (a third person POV), this is what I can conclude. Giraph's velocity on Hadoop MapReduce may be real the impedence for thinking about a possible sharing of the bsp based implementation with HAMAV2. Sure, Giraph has other ideas regarding the computation model itself, but that is a future that isn't here yet. I just hope the same velocity isn't an impedance for thinking about the next-gen version on top of YARN :) The way I see it, porting Giraph to YARN is also a revolution in itself; most, if not all, of the implementation will change yet with the API level compatibility. I am still eagerly looking forward to the port of Giraph to YARN. May be more digging into Giraph internals may help my cause too. If nothing, this discussion atleast helped sharing of some of the ideas between the two communities. Thanks all for putting down in your thoughts. +Vinod On Wed, Sep 14, 2011 at 11:46 AM, Thomas Jungblut < thomas.jungb...@googlemail.com> wrote: > We are also thinking about other underlying computing models (i.e. >> streaming (asynchronous) graph processing - see > > > That is a really cool idea. But I don't think we are going to focus solely > on graph computing. We want to enable an interface which can be used for it > (straight forward as described in the Pregel Paper), but I think you are > really graph experts- so we don't want to compete with each other :D > Our asynchronous processing (in my opinion) will just enable the sending of > messages within the computation phase. So the BarrierSync is just a little > transition to make sure every task is ready and every message has been send. > Your Vertex locking is a graph-only feature, this won't be effecting us > anyways. > > > Giraph runs completely as a MapReduce job on Hadoop today. >> > > Allright. > > I think our result is the following: > We (Apache Hama) are focussing on the YARN implementation of the BSP > paradigm. > If you want to run Giraph on a real BSP engine later, feel free to put your > stuff on top of that. > As far as I have seen, there is a 100% backward compatibility of YARN, so > your current solution will run on YARN either. > > Best Regards, > > Thomas >