[Hadoop Wiki] Update of "Hamburg" by edwardyoon

Apache Wiki Wed, 01 Jul 2009 23:56:44 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The following page has been changed by edwardyoon:
http://wiki.apache.org/hadoop/Hamburg

The comment on the change is:
become a proposal.

------------------------------------------------------------------------------
  ## page was renamed from Hambrug
  
+ == Motivation ==
+ The MapReduce (M/R) programming model is inappropriate to problems based on 
data where each portion depends on many other potions and their relations are 
very complicated. It is because these problems cause as follows:
+  * limit to assigning one reducer
+   * In case that the relations of data are very complex, assigning 
intermediate data to appropriate reducers by considering their dependency of 
partitioned graphs may be very hard. Assigning only one reducer is a 
straightway to solve complexity dependency, but it is apparent to cause 
deterioration of scalability.
+  * many M/R iterations
+  * or make an M/R program more complicated
+   * To avoid above two inefficient methods, the M/R program will be 
complicated with code to communicate data among data nodes.
+ 
+ These problems are very common in many areas; especially, many graph problems 
are exemplary. 
+ 
+ TODO - write description of an example.
+ 
+ Therefore, we try to propose a new programming model, named Hamburg. The main 
objective of Hamburg is to support well the problems based on data having 
complexity dependency one another. This page is an initial work of our proposal.
+ 
+ == Goal ==
+  * Follow scalability concept of shared-nothing architecture
+  * Support a simple programming model to compute complex relations such as, 
graph data.
+ 
  == Hamburg ==
- Let's discuss about the graph computing framework named Hamburg.
+ Hambrug is an alternative to M/R programming model. It is based on bulk 
synchronization parallel (BSP) model. Like M/R, Hambrug takes advantages from 
shared-nothing architecture (SN), so I expect that it will also show scalablity 
without almost degradation of performance as the number of participant nodes 
increases.
+ A Hamburg based on BSP computation step consists of three sub steps:
+  * Computation on data that reside in local storage; it is similar to map 
operation in M/R.
+  * Each node communicates its necessary data into one another.
+  * All processors synchronize which waits for all of the communications 
actions to complete.
+ The main difference between Hamburg and M/R is that Hamburg does not make 
intermediate data aggregate into reducer. Instead, each computation node 
communicates only necessary data into one another. 
+ It will be efficient if total communicated data is smaller then intermediate 
data to be aggregated into reducers.
  
+ === Initial contributors ===
- ----
- If you want to talk directly with us,
- 
   * Edward J. (edwardyoon AT apache.org)
   * Hyunsik Choi (hyunsik.choi AT gmail.com)
+ 
+ Any volunteers are welcome.
  
  == Related Projects ==

[Hadoop Wiki] Update of "Hamburg" by edwardyoon

Reply via email to