Crosspost to hama-dev and giraph-dev.
It was only in my morning time that I was looking at HAMA-431, the port of
Hama to YARN. And one of the tweets reminded me of JIRA issue GIRAPH-13
which is about porting Giraph to YARN.
I was also looking at the Girpah proposal for entry into Apache Incubator.
There is an interesting section there:
Relationships with Other Apache Products
Giraph has some overlapping functionality with Apache Hama. However, there
are some significant differences. Giraph focuses on graph-based bulk
synchronous parallel (BSP) computing, while Apache Hama is more for general
purposed BSP computing. Giraph runs on the Hadoop infrastructure, while
Apache Hama uses its own computing framework.
I agree with the point about Hama being a general purposed BSP and Giraph
being completely graph oriented. But the later one about the infrastructure
is going to be moot with both Giraph and Hama trying to be ported over to
So here's my billion dollar question: Is it possible to implement Girpah's
graph based APIs over the Hama's bsp APIs which both run over a single
Apache BSP implementation over YARN?
I also do see the email thread regarding Hama and Giraph's future
collaboration when Hadoop NextGen aka YARN comes in:
http://s.apache.org/HamaVsGiraph. So are we ready for this yet?
Disclaimer: I come from the Hadoop world, have no idea of Giraph's APIs or
internals except that I see a bsp package in Giraph's source tree. I do know
a tiny bit about Hama's APIs and internal but my expertise is only two days.
(An elephant maintainer trying to see if a Giraffe can be made to ride over
a hippopotamus riding over an elephant)