> There must have been definitely some thought around this. Yes, there was some thought put into the design before writing - in Hadoop's case - hundreds of thousands of lines of code, and, for Giraph, thousands so far. I invite you to read the MapReduce paper (http://research.google.com/archive/mapreduce.html) and the Pregel paper (http://dl.acm.org/citation.cfm?id=1807184) to learn more about what prompted these designs.
On Sat, Dec 10, 2011 at 9:14 AM, Avery Ching <ach...@apache.org> wrote: > On 12/9/11 10:22 PM, Praveen Sripati wrote: > > Jack, > >> Giraph maps do communicate: via RPC. This is done repeatedly in every >> mapper, during the compute phase. This is something that is not normal to >> MapReduce, it is special to Giraph. > > There must have been definitely some thought around this. But, we can also > have a mapper correspond to just the computation phase in a superstep and > avoid communication between the mappers as in MapReduce. Later spawn another > set of mappers for the next superset. There might be some reason why > communication between mappers was avoided in MR. > > Communication between mappers is not part of the MapReduce computing model. > Therefore, it doesn't make sense for them to include it as it would > unnecessarily complicate the fault-tolerance recovery. > > > Any thoughts? > > Regards, > Praveen > > On Sat, Dec 10, 2011 at 10:35 AM, Jake Mannix <jake.man...@gmail.com> wrote: >> >> >> >> On Fri, Dec 9, 2011 at 8:16 PM, Praveen Sripati <praveensrip...@gmail.com> >> wrote: >>> >>> Jake, >>> >>> >>> > Let's not crosspost, please, it make the thread of conversation totally >>> > opaque as to who is talking about what. >>> >>> Agree. I got it after the OP. >>> >>> >>> > There is only one set of map tasks for the Giraph job - those >>> > long-running map tasks run possibly many supersteps. >>> >>> OK. But, map tasks don't communicate with each other. How are messages >>> sent across in the communication phase of a super step that happens within a >>> map? >> >> >> Giraph maps do communicate: via RPC. This is done repeatedly in every >> mapper, during the compute phase. This is something that is not normal to >> MapReduce, it is special to Giraph. >> >>> >>> > In Giraph, vertices can move around workers between supersteps. A >>> > vertex will run on the worker that it is assigned to. >>> >>> Is there any advantage of moving the processing of vertices from one >>> worker to another. Can't there be affinity between a worker and the vertices >>> it processes? >> >> >> Often there will be affinity, but if the graph itself evolves during >> computation (some sort of iterative pruning or clustering), then moving >> around may make sense. Also: if nodes die. >> >> -jake >> >>> >>> Regards, >>> Praveen >>> >>> On Fri, Dec 9, 2011 at 11:33 PM, Jake Mannix <jake.man...@gmail.com> >>> wrote: >>>> >>>> [hama-user to bcc:] >>>> >>>> Let's not crosspost, please, it make the thread of conversation totally >>>> opaque as to who is talking about what. >>>> >>>> On Fri, Dec 9, 2011 at 1:42 AM, Praveen Sripati >>>> <praveensrip...@gmail.com> wrote: >>>>> >>>>> Thanks to Thomas and Avery for the response. >>>>> >>>>> > For Giraph you are quite correct, all the stuff is submitted as a MR >>>>> > job. But a full map stage is not a superstep, the whole computation is a >>>>> > done in one mapping phase. >>>>> >>>>> So a map task in MR corresponds to a computation phase in a superstep. >>>>> Once the computation phase for a superstep is complete, the vertex output >>>>> is >>>>> stored using the defined OutputFormat, the message sent (may be) to >>>>> another >>>>> vertex and the map task is stopped. Once the barrier synchronization phase >>>>> is complete, another set of map tasks are invoked for the vertices which >>>>> have received a message. >>>> >>>> >>>> In Giraph, each superstep does not lead to storage into an OutputFormat. >>>> The data lives all in memory from the time the first superstep starts to >>>> the time the final superstep stops (except that for tolerance of failures, >>>> checkpoints are stored to disk at user-specified intervals). There is only >>>> one set of map tasks for the Giraph job - those long-running map tasks run >>>> possibly many supersteps. >>>> >>>>> >>>>> In a regular MR Job (not Giraph) the number of Map tasks equals to the >>>>> number of InputSplits. But, in case of Giraph the total number of maps to >>>>> be >>>>> launched is usually more than the number of input vertices. >>>> >>>> >>>> Number of maps > number of input vertices? Not at all. That would be >>>> insane. We want to be able to run over multi-billion vertex graphs. We're >>>> going to launch multiple billions of mappers? The splitting of the data >>>> in >>>> Giraph is very similar to in a regular MR job, divide up your input data >>>> among the number of mappers you have, and you're off and running. >>>> >>>>> >>>>> >>>>> > Where are the incoming, outgoing messages and state stored >>>>> > Memory >>>>> >>>>> What happens if a particular node is lost in case of Hama and Giraph? >>>>> Are the messages not persisted somewhere to be fetched later. >>>> >>>> >>>> If nodes are lost, the system has to back up to the most recent >>>> checkpoint, where graph state has been persisted to HDFS. Messages are not >>>> currently persisted, but the state at which the graph was in to produce any >>>> messages was. >>>> >>>>> >>>>> > In Giraph, vertices can move around workers between supersteps. A >>>>> > vertex will run on the worker that it is assigned to. >>>>> >>>>> Is data locality considered while moving vertices around workers in >>>>> Giraph? >>>> >>>> >>>> Data is all in memory, and typical graph algorithms are basically >>>> sending roughly the size of the entire graph (number of total edges) out >>>> over distributed RPC in any given superstep, so shuffling the graph around >>>> by RPC is not much more to do. >>>> >>>>> >>>>> >>>>> > As you can see, you could write a MapReduce Engine with BSP on top of >>>>> > Apache Hama. >>>>> >>>>> It's being the done other way, BSP is implemented in Giraph using >>>>> Hadoop. >>>> >>>> >>>> I'll let the Hama people explain to you about how one would implement MR >>>> on top of Hama. You are correct that in Giraph, the Hadoop >>>> JobTracker/TaskTracker and HDFS are used as substrate to help implement BSP >>>> (although I would not say that "MR" is being used to implement BSP, as >>>> there >>>> is no MR going on in Giraph). >>>> >>>> -jake >>>> >>>>> >>>>> >>>>> >>>>> Praveen >>>>> >>>>> On Fri, Dec 9, 2011 at 12:51 PM, Avery Ching <ach...@apache.org> wrote: >>>>>> >>>>>> Hi Praveen, >>>>>> >>>>>> Answers inline. Hope that helps! >>>>>> >>>>>> Avery >>>>>> >>>>>> On 12/8/11 10:16 PM, Praveen Sripati wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I know about MapReduce/Hadoop and trying to get myself around >>>>>> BSP/Hama-Giraph by comparing MR and BSP. >>>>>> >>>>>> - Map Phase in MR is similar to Computation Phase in BSP. BSP allows >>>>>> for process to exchange data in the communication phase, but there is no >>>>>> communication between the mappers in the Map Phase. Though the data flows >>>>>> from Map tasks to Reducer tasks. Please correct me if I am wrong. Any >>>>>> other >>>>>> significant differences? >>>>>> >>>>>> I suppose you can think of it that way. I like to compare a BSP >>>>>> superstep to a MapReduce job since it's computation and communication. >>>>>> >>>>>> - After going through the documentation for Hama and Giraph, noticed >>>>>> that they both use Hadoop as the underlying framework. In both Hama and >>>>>> Giraph an MR Job is submitted. Does each superstep in BSP correspond to a >>>>>> Job in MR? Where are the incoming, outgoing messages and state stored - >>>>>> HDFS >>>>>> or HBase or Local or pluggable? >>>>>> >>>>>> My understanding of Hama is that they have their own BSP framework. >>>>>> Giraph can be run on a Hadoop installation, it does not have its own >>>>>> computational framework. A Giraph job is submitted to a Hadoop >>>>>> installation >>>>>> as a Map-only job. Hama will have its own BSP lauching framework. >>>>>> >>>>>> In Giraph, the state is stored all in memory. Graphs are >>>>>> loaded/stored through VertexInputFormat/VertexOutputFormat (very similar >>>>>> to >>>>>> Hadoop). You could implement your own >>>>>> VertexInputFormat/VertexOutputFormat >>>>>> to use HDFS, HBase, etc. as your graph stable storage. >>>>>> >>>>>> - If a Vertex is deactivated and again activated after receiving a >>>>>> message, does is run on the same node or a different node in the cluster? >>>>>> >>>>>> In Giraph, vertices can move around workers between supersteps. A >>>>>> vertex will run on the worker that it is assigned to. >>>>>> >>>>>> Regards, >>>>>> Praveen >>>>>> >>>>>> >>>>> >>>> >>> >> > >