Re: Comparing BSP and MR

Jakob Homan Sat, 10 Dec 2011 09:26:56 -0800

> There must have been definitely some thought around this.
Yes, there was some thought put into the design before writing - in
Hadoop's  case - hundreds of thousands of lines of code, and, for
Giraph, thousands so far.  I invite you to read the MapReduce paper
(http://research.google.com/archive/mapreduce.html) and the Pregel
paper (http://dl.acm.org/citation.cfm?id=1807184) to learn more about
what prompted these designs.



On Sat, Dec 10, 2011 at 9:14 AM, Avery Ching <ach...@apache.org> wrote:
> On 12/9/11 10:22 PM, Praveen Sripati wrote:
>
> Jack,
>
>> Giraph maps do communicate: via RPC.  This is done repeatedly in every
>> mapper, during the compute phase.  This is something that is not normal to
>> MapReduce, it is special to Giraph.
>
> There must have been definitely some thought around this. But, we can also
> have a mapper correspond to just the computation phase in a superstep and
> avoid communication between the mappers as in MapReduce. Later spawn another
> set of mappers for the next superset. There might be some reason why
> communication between mappers was avoided in MR.
>
> Communication between mappers is not part of the MapReduce computing model.
> Therefore, it doesn't make sense for them to include it as it would
> unnecessarily complicate the fault-tolerance recovery.
>
>
> Any thoughts?
>
> Regards,
> Praveen
>
> On Sat, Dec 10, 2011 at 10:35 AM, Jake Mannix <jake.man...@gmail.com> wrote:
>>
>>
>>
>> On Fri, Dec 9, 2011 at 8:16 PM, Praveen Sripati <praveensrip...@gmail.com>
>> wrote:
>>>
>>> Jake,
>>>
>>>
>>> > Let's not crosspost, please, it make the thread of conversation totally
>>> > opaque as to who is talking about what.
>>>
>>> Agree. I got it after the OP.
>>>
>>>
>>> > There is only one set of map tasks for the Giraph job - those
>>> > long-running map tasks run possibly many supersteps.
>>>
>>> OK. But, map tasks don't communicate with each other. How are messages
>>> sent across in the communication phase of a super step that happens within a
>>> map?
>>
>>
>> Giraph maps do communicate: via RPC.  This is done repeatedly in every
>> mapper, during the compute phase.  This is something that is not normal to
>> MapReduce, it is special to Giraph.
>>
>>>
>>> > In Giraph, vertices can move around workers between supersteps.  A
>>> > vertex will run on the worker that it is assigned to.
>>>
>>> Is there any advantage of moving the processing of vertices from one
>>> worker to another. Can't there be affinity between a worker and the vertices
>>> it processes?
>>
>>
>> Often there will be affinity, but if the graph itself evolves during
>> computation (some sort of iterative pruning or clustering), then moving
>> around may make sense.  Also: if nodes die.
>>
>>   -jake
>>
>>>
>>> Regards,
>>> Praveen
>>>
>>> On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix <jake.man...@gmail.com>
>>> wrote:
>>>>
>>>> [hama-user to bcc:]
>>>>
>>>> Let's not crosspost, please, it make the thread of conversation totally
>>>> opaque as to who is talking about what.
>>>>
>>>> On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati
>>>> <praveensrip...@gmail.com> wrote:
>>>>>
>>>>> Thanks to Thomas and Avery for the response.
>>>>>
>>>>> > For Giraph you are quite correct, all the stuff is submitted as a MR
>>>>> > job. But a full map stage is not a superstep, the whole computation is a
>>>>> > done in one mapping phase.
>>>>>
>>>>> So a map task in MR corresponds to a computation phase in a superstep.
>>>>> Once the computation phase for a superstep is complete, the vertex output 
>>>>> is
>>>>> stored using the defined OutputFormat, the message sent (may be) to 
>>>>> another
>>>>> vertex and the map task is stopped. Once the barrier synchronization phase
>>>>> is complete, another set of map tasks are invoked for the vertices which
>>>>> have received a message.
>>>>
>>>>
>>>> In Giraph, each superstep does not lead to storage into an OutputFormat.
>>>>  The data lives all in memory from the time the first superstep starts to
>>>> the time the final superstep stops (except that for tolerance of failures,
>>>> checkpoints are stored to disk at user-specified intervals).  There is only
>>>> one set of map tasks for the Giraph job - those long-running map tasks run
>>>> possibly many supersteps.
>>>>
>>>>>
>>>>> In a regular MR Job (not Giraph) the number of Map tasks equals to the
>>>>> number of InputSplits. But, in case of Giraph the total number of maps to 
>>>>> be
>>>>> launched is usually more than the number of input vertices.
>>>>
>>>>
>>>> Number of maps > number of input vertices?  Not at all.  That would be
>>>> insane.  We want to be able to run over multi-billion vertex graphs.  We're
>>>> going to launch multiple billions of mappers?   The splitting of the data 
>>>> in
>>>> Giraph is very similar to in a regular MR job, divide up your input data
>>>> among the number of mappers you have, and you're off and running.
>>>>
>>>>>
>>>>>
>>>>> > Where are the incoming, outgoing messages and state stored
>>>>> > Memory
>>>>>
>>>>> What happens if a particular node is lost in case of Hama and Giraph?
>>>>> Are the messages not persisted somewhere to be fetched later.
>>>>
>>>>
>>>> If nodes are lost, the system has to back up to the most recent
>>>> checkpoint, where graph state has been persisted to HDFS.  Messages are not
>>>> currently persisted, but the state at which the graph was in to produce any
>>>> messages was.
>>>>
>>>>>
>>>>> > In Giraph, vertices can move around workers between supersteps.  A
>>>>> > vertex will run on the worker that it is assigned to.
>>>>>
>>>>> Is data locality considered while moving vertices around workers in
>>>>> Giraph?
>>>>
>>>>
>>>> Data is all in memory, and typical graph algorithms are basically
>>>> sending roughly the size of the entire graph (number of total edges) out
>>>> over distributed RPC in any given superstep, so shuffling the graph around
>>>> by RPC is not much more to do.
>>>>
>>>>>
>>>>>
>>>>> > As you can see, you could write a MapReduce Engine with BSP on top of
>>>>> > Apache Hama.
>>>>>
>>>>> It's being the done other way, BSP is implemented in Giraph using
>>>>> Hadoop.
>>>>
>>>>
>>>> I'll let the Hama people explain to you about how one would implement MR
>>>> on top of Hama.  You are correct that in Giraph, the Hadoop
>>>> JobTracker/TaskTracker and HDFS are used as substrate to help implement BSP
>>>> (although I would not say that "MR" is being used to implement BSP, as 
>>>> there
>>>> is no MR going on in Giraph).
>>>>
>>>>   -jake
>>>>
>>>>>
>>>>>
>>>>>
>>>>> Praveen
>>>>>
>>>>> On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching <ach...@apache.org> wrote:
>>>>>>
>>>>>> Hi Praveen,
>>>>>>
>>>>>> Answers inline.  Hope that helps!
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>> On 12/8/11 10:16 PM, Praveen Sripati wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I know about MapReduce/Hadoop and trying to get myself around
>>>>>> BSP/Hama-Giraph by comparing MR and BSP.
>>>>>>
>>>>>> - Map Phase in MR is similar to Computation Phase in BSP. BSP allows
>>>>>> for process to exchange data in the communication phase, but there is no
>>>>>> communication between the mappers in the Map Phase. Though the data flows
>>>>>> from Map tasks to Reducer tasks. Please correct me if I am wrong. Any 
>>>>>> other
>>>>>> significant differences?
>>>>>>
>>>>>> I suppose you can think of it that way.  I like to compare a BSP
>>>>>> superstep to a MapReduce job since it's computation and communication.
>>>>>>
>>>>>> - After going through the documentation for Hama and Giraph, noticed
>>>>>> that they both use Hadoop as the underlying framework. In both Hama and
>>>>>> Giraph an MR Job is submitted. Does each superstep in BSP correspond to a
>>>>>> Job in MR? Where are the incoming, outgoing messages and state stored - 
>>>>>> HDFS
>>>>>> or HBase or Local or pluggable?
>>>>>>
>>>>>> My understanding of Hama is that they have their own BSP framework.
>>>>>> Giraph can be run on a Hadoop installation, it does not have its own
>>>>>> computational framework.  A Giraph job is submitted to a Hadoop 
>>>>>> installation
>>>>>> as a Map-only job.  Hama will have its own BSP lauching framework.
>>>>>>
>>>>>> In Giraph, the state is stored all in memory.  Graphs are
>>>>>> loaded/stored through VertexInputFormat/VertexOutputFormat (very similar 
>>>>>> to
>>>>>> Hadoop).  You could implement your own 
>>>>>> VertexInputFormat/VertexOutputFormat
>>>>>> to use HDFS, HBase, etc. as your graph stable storage.
>>>>>>
>>>>>> - If a Vertex is deactivated and again activated after receiving a
>>>>>> message, does is run on the same node or a different node in the cluster?
>>>>>>
>>>>>> In Giraph, vertices can move around workers between supersteps.  A
>>>>>> vertex will run on the worker that it is assigned to.
>>>>>>
>>>>>> Regards,
>>>>>> Praveen
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: Comparing BSP and MR

Reply via email to