[jira] [Resolved] (HAMA-560) Partitioning should be done in parallel

Edward J. Yoon (Resolved) (JIRA) Thu, 19 Apr 2012 21:28:07 -0700

     [ 
https://issues.apache.org/jira/browse/HAMA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Edward J. Yoon resolved HAMA-560.
---------------------------------

    Resolution: Duplicate

Duplicated with HAMA-531
                
> Partitioning should be done in parallel
> ---------------------------------------
>
>                 Key: HAMA-560
>                 URL: https://issues.apache.org/jira/browse/HAMA-560
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.4.0
>            Reporter: praveen sripati
>
> Currently partitioning happens in the node on which the job has been 
> submitted in the BSPJobClient#submitJobInternal(). The partitioning happens 
> in sequence and this will be a bottle neck as the input data size grows. With 
> partitioning in parallel, the completion time for the job also 
> Here are some of the options to evaluate
> - Multiple threads to do the partitioning in the BSPJobClient#partition(). 
> This is an easy fix, but the partitioning is still restricted to a single 
> node. There might be problem with simultanious writes to the same file.
> - Use MR to partition the data. To check if we can kick an MR job with 
> BSPJobClient#partition() to partition the input data. The # of reducers 
> should be set to the # of bsp tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HAMA-560) Partitioning should be done in parallel

Reply via email to