Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change 
notification.

The "Partitioning" page has been changed by edwardyoon:
http://wiki.apache.org/hama/Partitioning?action=diff&rev1=4&rev2=5

  == User-defined partitioning ==
  
- The partitioner is designed for determining how to distribute the input data 
among computing workers of a Bulk Synchronous Parallel processing. Remember, 
this is not related with output collection, unlike Map/Reduce's partition 
function.
+ In Hama BSP computing framework, the Partition function is used for obtaining 
scalability of a Bulk Synchronous Parallel processing, and determining how to 
distribute the slices of input data among BSP processors. Unlike MapReduce data 
processing model, many scientific algorithms based on Message-Passing Bulk 
Synchronous Parallel model often requires that a processor obtain “nearby or 
related” data from other processors in order to complete the processing. In 
this case, processors determine their communication partners, or neighbors 
using Partition function.
  
- Input data-partitioning works as following sequence:
+ Internally, Input data-partitioning works as following sequence:
  
   * If user specified partition function, internally, "partitioning job" is 
ran as a pre-processing step.
    * Each task of "partitioning job" reads its assigned data block and rewrite 
them to particular partition files.

Reply via email to