Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The "Partitioning" page has been changed by edwardyoon: http://wiki.apache.org/hama/Partitioning?action=diff&rev1=4&rev2=5 == User-defined partitioning == - The partitioner is designed for determining how to distribute the input data among computing workers of a Bulk Synchronous Parallel processing. Remember, this is not related with output collection, unlike Map/Reduce's partition function. + In Hama BSP computing framework, the Partition function is used for obtaining scalability of a Bulk Synchronous Parallel processing, and determining how to distribute the slices of input data among BSP processors. Unlike MapReduce data processing model, many scientific algorithms based on Message-Passing Bulk Synchronous Parallel model often requires that a processor obtain “nearby or related” data from other processors in order to complete the processing. In this case, processors determine their communication partners, or neighbors using Partition function. - Input data-partitioning works as following sequence: + Internally, Input data-partitioning works as following sequence: * If user specified partition function, internally, "partitioning job" is ran as a pre-processing step. * Each task of "partitioning job" reads its assigned data block and rewrite them to particular partition files.
