Re: Comparing BSP and MR

Avery Ching Sat, 10 Dec 2011 09:14:44 -0800

On 12/9/11 10:22 PM, Praveen Sripati wrote:

Jack,
> Giraph maps do communicate: via RPC. This is done repeatedly inevery mapper, during the compute phase. This is something that is notnormal to MapReduce, it is special to Giraph.
There must have been definitely some thought around this. But, we canalso have a mapper correspond to just the computation phase in asuperstep and avoid communication between the mappers as in MapReduce.Later spawn another set of mappers for the next superset. There mightbe some reason why communication between mappers was avoided in MR.

Communication between mappers is not part of the MapReduce computingmodel. Therefore, it doesn't make sense for them to include it as itwould unnecessarily complicate the fault-tolerance recovery.

Any thoughts?

Regards,
Praveen

On Sat, Dec 10, 2011 at 10:35 AM, Jake Mannix <jake.man...@gmail.com<mailto:jake.man...@gmail.com>> wrote:




    On Fri, Dec 9, 2011 at 8:16 PM, Praveen Sripati
    <praveensrip...@gmail.com <mailto:praveensrip...@gmail.com>> wrote:

        Jake,


        > Let's not crosspost, please, it make the thread of
        conversation totally opaque as to who is talking about what.

        Agree. I got it after the OP.


        > There is only one set of map tasks for the Giraph job -
        those long-running map tasks run possibly many supersteps.

        OK. But, map tasks don't communicate with each other. How are
        messages sent across in the communication phase of a super
        step that happens within a map?


    Giraph maps do communicate: via RPC.  This is done repeatedly in
    every mapper, during the compute phase.  This is something that is
    not normal to MapReduce, it is special to Giraph.

        > In Giraph, vertices can move around workers between
        supersteps.  A vertex will run on the worker that it is
        assigned to.

        Is there any advantage of moving the processing of vertices
        from one worker to another. Can't there be affinity between a
        worker and the vertices it processes?


    Often there will be affinity, but if the graph itself evolves
    during computation (some sort of iterative pruning or clustering),
    then moving around may make sense.  Also: if nodes die.
      -jake


        Regards,
        Praveen

        On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix
        <jake.man...@gmail.com <mailto:jake.man...@gmail.com>> wrote:

            [hama-user to bcc:]

            Let's not crosspost, please, it make the thread of
            conversation totally opaque as to who is talking about what.

            On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati
            <praveensrip...@gmail.com
            <mailto:praveensrip...@gmail.com>> wrote:

                Thanks to Thomas and Avery for the response.

                > For Giraph you are quite correct, all the stuff is
                submitted as a MR job. But a full map stage is not a
                superstep, the whole computation is a done in one
                mapping phase.

                So a map task in MR corresponds to a computation phase
                in a superstep. Once the computation phase for a
                superstep is complete, the vertex output is stored
                using the defined OutputFormat, the message sent (may
                be) to another vertex and the map task is stopped.
                Once the barrier synchronization phase is complete,
                another set of map tasks are invoked for the vertices
                which have received a message.


            In Giraph, each superstep does not lead to storage into an
            OutputFormat.  The data lives all in memory from the time
            the first superstep starts to the time the final superstep
            stops (except that for tolerance of failures, checkpoints
            are stored to disk at user-specified intervals).  There is
            only one set of map tasks for the Giraph job - those
            long-running map tasks run possibly many supersteps.

                In a regular MR Job (not Giraph) the number of Map
                tasks equals to the number of InputSplits. But, in
                case of Giraph the total number of maps to be launched
                is usually more than the number of input vertices.


            Number of maps > number of input vertices?  Not at all.
             That would be insane.  We want to be able to run over
            multi-billion vertex graphs.  We're going to launch
            multiple billions of mappers?   The splitting of the data
            in Giraph is very similar to in a regular MR job, divide
            up your input data among the number of mappers you have,
            and you're off and running.


                > Where are the incoming, outgoing messages and state
                stored
                > Memory

                What happens if a particular node is lost in case of
                Hama and Giraph? Are the messages not persisted
                somewhere to be fetched later.


            If nodes are lost, the system has to back up to the most
            recent checkpoint, where graph state has been persisted to
            HDFS.  Messages are not currently persisted, but the state
            at which the graph was in to produce any messages was.

                > In Giraph, vertices can move around workers between
                supersteps.  A vertex will run on the worker that it
                is assigned to.

                Is data locality considered while moving vertices
                around workers in Giraph?


            Data is all in memory, and typical graph algorithms are
            basically sending roughly the size of the entire graph
            (number of total edges) out over distributed RPC in any
            given superstep, so shuffling the graph around by RPC is
            not much more to do.


                > As you can see, you could write a MapReduce Engine
                with BSP on top of Apache Hama.

                It's being the done other way, BSP is implemented in
                Giraph using Hadoop.


            I'll let the Hama people explain to you about how one
            would implement MR on top of Hama.  You are correct that
            in Giraph, the Hadoop JobTracker/TaskTracker and HDFS are
            used as substrate to help implement BSP (although I would
            not say that "MR" is being used to implement BSP, as there
            is no MR going on in Giraph).

              -jake



                Praveen

                On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching
                <ach...@apache.org <mailto:ach...@apache.org>> wrote:

                    Hi Praveen,

                    Answers inline.  Hope that helps!

                    Avery

                    On 12/8/11 10:16 PM, Praveen Sripati wrote:

                    Hi,

                    I know about MapReduce/Hadoop and trying to get
                    myself around BSP/Hama-Giraph by comparing MR and
                    BSP.

                    - Map Phase in MR is similar to Computation Phase
                    in BSP. BSP allows for process to exchange data
                    in the communication phase, but there is no
                    communication between the mappers in the Map
                    Phase. Though the data flows from Map tasks to
                    Reducer tasks. Please correct me if I am wrong.
                    Any other significant differences?

                    I suppose you can think of it that way.  I like to
                    compare a BSP superstep to a MapReduce job since
                    it's computation and communication.

                    - After going through the documentation for Hama
                    and Giraph, noticed that they both use Hadoop as
                    the underlying framework. In both Hama and Giraph
                    an MR Job is submitted. Does each superstep in
                    BSP correspond to a Job in MR? Where are the
                    incoming, outgoing messages and state stored -
                    HDFS or HBase or Local or pluggable?

                    My understanding of Hama is that they have their
                    own BSP framework.  Giraph can be run on a Hadoop
                    installation, it does not have its own
                    computational framework.  A Giraph job is
                    submitted to a Hadoop installation as a Map-only
                    job.  Hama will have its own BSP lauching framework.

In Giraph, the state is stored all in memory.Graphs are loaded/stored through

                    VertexInputFormat/VertexOutputFormat (very similar
                    to Hadoop).  You could implement your own
                    VertexInputFormat/VertexOutputFormat to use HDFS,
                    HBase, etc. as your graph stable storage.

                    - If a Vertex is deactivated and again activated
                    after receiving a message, does is run on the
                    same node or a different node in the cluster?

                    In Giraph, vertices can move around workers
                    between supersteps.  A vertex will run on the
                    worker that it is assigned to.

                    Regards,
                    Praveen

Re: Comparing BSP and MR

Reply via email to