Re: Comparing BSP and MR

Avery Ching Thu, 08 Dec 2011 23:21:44 -0800

Hi Praveen,

Answers inline.  Hope that helps!


Avery

On 12/8/11 10:16 PM, Praveen Sripati wrote:

Hi,
I know about MapReduce/Hadoop and trying to get myself aroundBSP/Hama-Giraph by comparing MR and BSP.
- Map Phase in MR is similar to Computation Phase in BSP. BSP allowsfor process to exchange data in the communication phase, but there isno communication between the mappers in the Map Phase. Though the dataflows from Map tasks to Reducer tasks. Please correct me if I amwrong. Any other significant differences?

I suppose you can think of it that way. I like to compare a BSPsuperstep to a MapReduce job since it's computation and communication.

- After going through the documentation for Hama and Giraph, noticedthat they both use Hadoop as the underlying framework. In both Hamaand Giraph an MR Job is submitted. Does each superstep in BSPcorrespond to a Job in MR? Where are the incoming, outgoing messagesand state stored - HDFS or HBase or Local or pluggable?

My understanding of Hama is that they have their own BSP framework.Giraph can be run on a Hadoop installation, it does not have its owncomputational framework. A Giraph job is submitted to a Hadoopinstallation as a Map-only job. Hama will have its own BSP lauchingframework.

In Giraph, the state is stored all in memory. Graphs are loaded/storedthrough VertexInputFormat/VertexOutputFormat (very similar to Hadoop).You could implement your own VertexInputFormat/VertexOutputFormat to useHDFS, HBase, etc. as your graph stable storage.

- If a Vertex is deactivated and again activated after receiving amessage, does is run on the same node or a different node in the cluster?

In Giraph, vertices can move around workers between supersteps. Avertex will run on the worker that it is assigned to.

Regards,
Praveen

Re: Comparing BSP and MR

Reply via email to