Re: Comparing BSP and MR

Jake Mannix Fri, 09 Dec 2011 21:06:41 -0800

On Fri, Dec 9, 2011 at 8:16 PM, Praveen Sripati <[email protected]>wrote:


> Jake,
>
>
> > Let's not crosspost, please, it make the thread of conversation totally
> opaque as to who is talking about what.
>
> Agree. I got it after the OP.
>
>
> > There is only one set of map tasks for the Giraph job - those
> long-running map tasks run possibly many supersteps.
>
> OK. But, map tasks don't communicate with each other. How are messages
> sent across in the communication phase of a super step that happens within
> a map?
>

Giraph maps do communicate: via RPC.  This is done repeatedly in every
mapper, during the compute phase.  This is something that is not normal to
MapReduce, it is special to Giraph.


> > In Giraph, vertices can move around workers between supersteps.  A
> vertex will run on the worker that it is assigned to.
>
> Is there any advantage of moving the processing of vertices from one
> worker to another. Can't there be affinity between a worker and the
> vertices it processes?
>

Often there will be affinity, but if the graph itself evolves during
computation (some sort of iterative pruning or clustering), then moving
around may make sense.  Also: if nodes die.

  -jake


> Regards,
> Praveen
>
> On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix <[email protected]>wrote:
>
>> [hama-user to bcc:]
>>
>> Let's not crosspost, please, it make the thread of conversation totally
>> opaque as to who is talking about what.
>>
>> On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati <[email protected]
>> > wrote:
>>
>>> Thanks to Thomas and Avery for the response.
>>>
>>> > For Giraph you are quite correct, all the stuff is submitted as a MR
>>> job. But a full map stage is not a superstep, the whole computation is a
>>> done in one mapping phase.
>>>
>>> So a map task in MR corresponds to a computation phase in a superstep.
>>> Once the computation phase for a superstep is complete, the vertex output
>>> is stored using the defined OutputFormat, the message sent (may be) to
>>> another vertex and the map task is stopped. Once the barrier
>>> synchronization phase is complete, another set of map tasks are invoked for
>>> the vertices which have received a message.
>>>
>>
>> In Giraph, each superstep does not lead to storage into an OutputFormat.
>>  The data lives all in memory from the time the first superstep starts to
>> the time the final superstep stops (except that for tolerance of failures,
>> checkpoints are stored to disk at user-specified intervals).  There is only
>> one set of map tasks for the Giraph job - those long-running map tasks run
>> possibly many supersteps.
>>
>>
>>> In a regular MR Job (not Giraph) the number of Map tasks equals to the
>>> number of InputSplits. But, in case of Giraph the total number of maps to
>>> be launched is usually more than the number of input vertices.
>>>
>>
>> Number of maps > number of input vertices?  Not at all.  That would be
>> insane.  We want to be able to run over multi-billion vertex graphs.  We're
>> going to launch multiple billions of mappers?   The splitting of the data
>> in Giraph is very similar to in a regular MR job, divide up your input data
>> among the number of mappers you have, and you're off and running.
>>
>>
>>>
>>> > Where are the incoming, outgoing messages and state stored
>>> > Memory
>>>
>>> What happens if a particular node is lost in case of Hama and Giraph?
>>> Are the messages not persisted somewhere to be fetched later.
>>
>>
>> If nodes are lost, the system has to back up to the most recent
>> checkpoint, where graph state has been persisted to HDFS.  Messages are not
>> currently persisted, but the state at which the graph was in to produce any
>> messages was.
>>
>>
>>> > In Giraph, vertices can move around workers between supersteps.  A
>>> vertex will run on the worker that it is assigned to.
>>>
>>> Is data locality considered while moving vertices around workers in
>>> Giraph?
>>>
>>
>> Data is all in memory, and typical graph algorithms are basically sending
>> roughly the size of the entire graph (number of total edges) out over
>> distributed RPC in any given superstep, so shuffling the graph around by
>> RPC is not much more to do.
>>
>>
>>>
>>> > As you can see, you could write a MapReduce Engine with BSP on top of
>>> Apache Hama.
>>>
>>> It's being the done other way, BSP is implemented in Giraph using Hadoop.
>>
>>
>> I'll let the Hama people explain to you about how one would implement MR
>> on top of Hama.  You are correct that in Giraph, the Hadoop
>> JobTracker/TaskTracker and HDFS are used as substrate to help implement BSP
>> (although I would not say that "MR" is being used to implement BSP, as
>> there is no MR going on in Giraph).
>>
>>   -jake
>>
>>
>>>
>>>
>>> Praveen
>>>
>>> On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching <[email protected]> wrote:
>>>
>>>>  Hi Praveen,
>>>>
>>>> Answers inline.  Hope that helps!
>>>>
>>>> Avery
>>>>
>>>> On 12/8/11 10:16 PM, Praveen Sripati wrote:
>>>>
>>>> Hi,
>>>>
>>>> I know about MapReduce/Hadoop and trying to get myself around
>>>> BSP/Hama-Giraph by comparing MR and BSP.
>>>>
>>>> - Map Phase in MR is similar to Computation Phase in BSP. BSP allows
>>>> for process to exchange data in the communication phase, but there is no
>>>> communication between the mappers in the Map Phase. Though the data flows
>>>> from Map tasks to Reducer tasks. Please correct me if I am wrong. Any other
>>>> significant differences?
>>>>
>>>> I suppose you can think of it that way.  I like to compare a BSP
>>>> superstep to a MapReduce job since it's computation and communication.
>>>>
>>>> - After going through the documentation for Hama and Giraph, noticed
>>>> that they both use Hadoop as the underlying framework. In both Hama and
>>>> Giraph an MR Job is submitted. Does each superstep in BSP correspond to a
>>>> Job in MR? Where are the incoming, outgoing messages and state stored -
>>>> HDFS or HBase or Local or pluggable?
>>>>
>>>>  My understanding of Hama is that they have their own BSP framework.
>>>> Giraph can be run on a Hadoop installation, it does not have its own
>>>> computational framework.  A Giraph job is submitted to a Hadoop
>>>> installation as a Map-only job.  Hama will have its own BSP lauching
>>>> framework.
>>>>
>>>> In Giraph, the state is stored all in memory.  Graphs are loaded/stored
>>>> through VertexInputFormat/VertexOutputFormat (very similar to Hadoop).  You
>>>> could implement your own VertexInputFormat/VertexOutputFormat to use HDFS,
>>>> HBase, etc. as your graph stable storage.
>>>>
>>>> - If a Vertex is deactivated and again activated after receiving a
>>>> message, does is run on the same node or a different node in the cluster?
>>>>
>>>>  In Giraph, vertices can move around workers between supersteps.  A
>>>> vertex will run on the worker that it is assigned to.
>>>>
>>>> Regards,
>>>> Praveen
>>>>
>>>>
>>>>
>>>
>>
>

Re: Comparing BSP and MR

Reply via email to