[ 
https://issues.apache.org/jira/browse/HAMA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward J. Yoon updated HAMA-531:
--------------------------------

    Attachment: patch_v02.txt

This patch fixes unit tests except weighted graph example (SSSP). Once all 
done, I'll fix partitioner.

My plan for partitioning input data is by using the BSP job. Each task 
processes a single input data block and writes files into destination 
directory. Finally, merge files. Then, the number of partitions can be 
specified by desired number.


                
> Data re-partitioning in BSPJobClient
> ------------------------------------
>
>                 Key: HAMA-531
>                 URL: https://issues.apache.org/jira/browse/HAMA-531
>             Project: Hama
>          Issue Type: Improvement
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>            Priority: Critical
>         Attachments: HAMA-531_1.patch, HAMA-531_2.patch, 
> HAMA-531_final.patch, patch.txt, patch_v02.txt
>
>
> The re-partitioning the data is a very expensive operation. By the way, 
> currently, we processes read/write operations sequentially using HDFS api in 
> BSPJobClient from client-side. This causes potential too many open files 
> error, contains HDFS overheads, and shows slow performance.
> We have to find another way to re-partitioning data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to