[
https://issues.apache.org/jira/browse/HAMA-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Edward J. Yoon resolved HAMA-560.
---------------------------------
Resolution: Duplicate
Duplicated with HAMA-531
> Partitioning should be done in parallel
> ---------------------------------------
>
> Key: HAMA-560
> URL: https://issues.apache.org/jira/browse/HAMA-560
> Project: Hama
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.4.0
> Reporter: praveen sripati
>
> Currently partitioning happens in the node on which the job has been
> submitted in the BSPJobClient#submitJobInternal(). The partitioning happens
> in sequence and this will be a bottle neck as the input data size grows. With
> partitioning in parallel, the completion time for the job also
> Here are some of the options to evaluate
> - Multiple threads to do the partitioning in the BSPJobClient#partition().
> This is an easy fix, but the partitioning is still restricted to a single
> node. There might be problem with simultanious writes to the same file.
> - Use MR to partition the data. To check if we can kick an MR job with
> BSPJobClient#partition() to partition the input data. The # of reducers
> should be set to the # of bsp tasks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira