Edward and I have chatted about this at times. It sounds better in
theory (both BSP based and adding support for MRv2) than in practice I
think (underlying implementations are quite different). Actually, I
also believe that in the future, Giraph is not going to solely be
BSP-based graph computing. We are also thinking about other underlying
computing models (i.e. streaming (asynchronous) graph processing - see
But I think today, the issues are the following:
1) Giraph runs completely as a MapReduce job on Hadoop today. This
needs to be maintained to support our current users, who will not likely
move to MRv2 for at least a year.
2) The internals of Giraph are implemented differently than Hama and
would take some time to port to.
3) If we have various graph processing computing models (BSP based,
streams or asynchronous, or a combination), then being on Hama brings
little value for Giraph.
Perhaps more practically, I wonder if it would be possible for someone
from the Hama team to refactor our code a bit to support Hama-style BSP
in Giraph? Certainly would be a pretty cool project...
On 9/13/11 4:49 AM, Edward J. Yoon wrote:
Quite a while ago, I implemented a clone of Google Pregel simply using
BSPLib and decided to focus on BSP computing engine.
Hama and Giraph projects are differ in slogan but not in kind.
If we made some collaboration, Giraph should be implemented on top of
Hama BSP computing engine.
Otherwise, we will back to square one again.
On Sun, Sep 11, 2011 at 11:22 PM, Vinod Kumar Vavilapalli
Crosspost to hama-dev and giraph-dev.
It was only in my morning time that I was looking at HAMA-431, the port of
Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
which is about porting Giraph to YARN.
I was also looking at the Girpah proposal for entry into Apache Incubator.
There is an interesting section there:
Relationships with Other Apache Products
Giraph has some overlapping functionality with Apache Hama. However, there
are some significant differences. Giraph focuses on graph-based bulk
synchronous parallel (BSP) computing, while Apache Hama is more for general
purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
Apache Hama uses its own computing framework.
I agree with the point about Hama being a general purposed BSP and Giraph
being completely graph oriented. But the later one about the infrastructure
is going to be moot with both Giraph and Hama trying to be ported over to
So here's my billion dollar question: Is it possible to implement Girpah's
graph based APIs over the Hama's bsp APIs which both run over a single
Apache BSP implementation over YARN?
I also do see the email thread regarding Hama and Giraph's future
collaboration when Hadoop NextGen aka YARN comes in:
http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
internals except that I see a bsp package in Giraph's source tree. I do know
a tiny bit about Hama's APIs and internal but my expertise is only two days.
(An elephant maintainer trying to see if a Giraffe can be made to ride over
a hippopotamus riding over an elephant)