[Hadoop Wiki] Trivial Update of "Hamburg" by edwardyoon

Apache Wiki Tue, 28 Jul 2009 00:04:58 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by edwardyoon:
http://wiki.apache.org/hadoop/Hamburg

------------------------------------------------------------------------------
  
  == Motivation ==
  The MapReduce (M/R) programming model is inappropriate to graph problems 
because of the following reasons:
- 
-  ''Do you know other situations that might fall into what you are describing 
above?''
  
   * '''!MapReduce does not support traversing graph''' â A mapper reads 
input data sequentially, and it canât control its input data. In contrast, 
most of the graph problems are based on walking vertices step by step. Walking 
vertices is to expand adjacent vertices from a given vertex.  This operation is 
only available if current input data can be determined by the previous 
operation. In MapReduce, however, the previous operation cannot affect the 
input data of the next operation. Traversing a vertex is the most basic 
premitive operation in graph operations. Consequently, graph processing with 
MapReduce is very limited. In order to come over this limit, we have to avoid 
traverse of graph in order to solve graph problems 
([http://ieeexplore.ieee.org/search/wrapper.jsp?arnumber=5076317 Graph 
Twiddling in a MapReduce World]) or have to use many M/R iterations each time 
walk vertices 
([http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html 
Breadth-First Search (B
 FS) & MapReduce]).
   * '''!MapReduce limits to assigning one reducer''' - In a MapReduce problem 
on graph data, assigning appropriate reducers according to their relations of 
partitioned graphs is very hard. Assigning only one reducer is a 
straightforward way to solve their complex relations, but it is apparent to 
cause deterioration of scalability.
@@ -13, +11 @@

  
  Therefore, we need a new programming model for graph processing on Hadoop.
  
-  ''We should survey other areas -- Edward J.''
+ == Goal ==
  
- == Goal ==
   * Support graph traverse
   * Support a simple programming interface dealing graph data
   * Follow the scalability concept of shared-nothing architecture
@@ -33, +30 @@

  
  Each worker will process the data fragments stored locally. And then, We can 
do bulk synchronization using collected communication data. The 'Computation' 
and 'Bulk synchronization' can be performed iteratively, Data for 
synchronization can be compressed to reduce network usage. The main difference 
between Hamburg and M/R is that Hamburg does not make intermediate data 
aggregate into reducer. Instead, each computation node communicates only 
necessary data into one another.  It will be efficient if total communicated 
data is smaller then intermediate data to be aggregated into reducers. Plainly, 
It aims to improve the performance of traverse operations in Graph computing. 
  
- For example, to explores all the neighboring nodes from the root node using 
Map/Reduce (FYI, 
[http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html 
Breadth-First Search (BFS) & MapReduce]), We need a lot of iterations to get 
next vertex per-hop time.
- 
- Let's assume the graph looks like presented below:
+ For example, let's assume the graph looks like presented below:
  
  
[http://lh5.ggpht.com/_DBxyBGtfa3g/SmQTwhOSGwI/AAAAAAAABmY/ERiJ2BUFxI0/s800/figure2.PNG]
+ 
+ We can store the graph as a adjacency matrix in HDFS or HBase and use 
MapReduce to explores all the neighboring nodes, but we need a lot of 
iterations to get next vertex per-hop time since as mentioned motivation. 
However, practically, we need only one communication between server2 and 
server3 for determine the distance between 6 and 7 as below:
+ 
+ 
[http://lh6.ggpht.com/_DBxyBGtfa3g/SmQTwvxT2zI/AAAAAAAABmc/BRrv7plzPtc/s800/figure3.PNG]
  
   ''TODO: write more detail of above. -- Edward J.''

[Hadoop Wiki] Trivial Update of "Hamburg" by edwardyoon

Reply via email to