[jira] [Commented] (HAMA-531) Data re-partitioning in BSPJobClient

Thomas Jungblut (JIRA) Mon, 21 May 2012 06:16:47 -0700

    [ 
https://issues.apache.org/jira/browse/HAMA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280133#comment-13280133
 ]


Thomas Jungblut commented on HAMA-531:
--------------------------------------

Two possible approaches:

We schedule a BSP job to write to a given number of files,
OR we use the same logic like the graph repair that will take a first superstep 
to read all the things and distribute it among the tasks afterwards.

I think that the last solution is quite simple.

bq.Does anyone know how it is done in Giraph?

Don't know, bet on the second solution, since their mapper input isn't very 
likely to be partitioned. 
                
> Data re-partitioning in BSPJobClient
> ------------------------------------
>
>                 Key: HAMA-531
>                 URL: https://issues.apache.org/jira/browse/HAMA-531
>             Project: Hama
>          Issue Type: Improvement
>            Reporter: Edward J. Yoon
>
> The re-partitioning the data is a very expensive operation. By the way, 
> currently, we processes read/write operations sequentially using HDFS api in 
> BSPJobClient from client-side. This causes potential too many open files 
> error, contains HDFS overheads, and shows slow performance.
> We have to find another way to re-partitioning data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-531) Data re-partitioning in BSPJobClient

Reply via email to