[Hama Wiki] Update of "Partitioning" by edwardyoon

Apache Wiki Sat, 28 Dec 2013 02:46:40 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change 
notification.


The "Partitioning" page has been changed by edwardyoon:
https://wiki.apache.org/hama/Partitioning?action=diff&rev1=29&rev2=30

    * Job Scheduler assigns partitions to proper task based on partition ID.
   * After pre-partitioning done, launch the BSP job.
  
+ In NoSQLs table input case (which supports range or random access by sorted 
key), pre-processing step will be skipped because they supports range scan.
- --
-  * ''Instead of running a separate job, we inject a partitioning superstep 
before the first superstep of the job. (This has a dependency on the Superstep 
API) For graph jobs, we can configure this partitioning superstep class 
specific to graph partitioning class that partitions and loads vertices. - 
Suraj Menon''
-   * ''Since scaling the number of BSP tasks between supersteps in single job, 
is not possible, the key is "how to launch the number of tasks differently with 
the number of file blocks". and Even though graph use own graph partitioning 
class, the fact that parsing vertex structure should be done at loadVertices() 
method hasn't changed to avoid unnecessary IO overheads. So, advantage is that 
additional job can be removed, and disadvantage is: we have to manage two BSP 
and Graph partitioning classes. - Edward J. Yoon''
-  * ''The partitions instead of being written to HDFS, which is creating a 
copy of input files in HDFS Cluster (too costly I believe), should be written 
to local files and read from. - Suraj Menon''
-   * ''If we want to run multiple jobs on same input, for example, friendship 
graph is input and want to run PageRank job and SSSP job, ...., etc., reuse of 
partitions should be considered. If partitions are written on local fs, 
metadata should be managed to reuse them. - Edward J. Yoon''
- 
- In NoSQLs table input case (which supports range or random access by sorted 
key), partitions doesn't need to be rewritten. In addition, Scanner instead of 
basic 'region' or 'tablet' splits can be used for forcing the number of 
processors. 
  
   * Job Scheduler assigns Scanner or tablet with its partition ID to proper 
task, launch the BSP job.

[Hama Wiki] Update of "Partitioning" by edwardyoon

Reply via email to