[Hama Wiki] Update of "Partitioning" by edwardyoon

Apache Wiki Sun, 06 Jan 2013 21:42:57 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change 
notification.


The "Partitioning" page has been changed by edwardyoon:
http://wiki.apache.org/hama/Partitioning?action=diff&rev1=2&rev2=3

  == User-defined partitioning ==
  
- The partitioner is designed for determining how to distribute the input data 
among computing workers of a Bulk Synchronous Parallel processing. Remember, 
this is not related with output collection, unlike MapReduce's partition 
function.
+ The partitioner is designed for determining how to distribute the input data 
among computing workers of a Bulk Synchronous Parallel processing. Remember, 
this is not related with output collection, unlike Map/Reduce's partition 
function.
  
- .... 
+ Input data-partitioning works as following sequence:
+ 
+  * If user specified partition function, internally, "partitioning job" is 
ran as a pre-processing step.
+   * Each task of "partitioning job" reads its assigned data block and rewrite 
them to particular partition files.
+  * After prepartitioning done, launch the mapreduce job.
+ 
  
  {{{
    BSPJob job = new BSPJob(conf);

[Hama Wiki] Update of "Partitioning" by edwardyoon

Reply via email to