[ https://issues.apache.org/jira/browse/HAMA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280133#comment-13280133 ]
Thomas Jungblut commented on HAMA-531: -------------------------------------- Two possible approaches: We schedule a BSP job to write to a given number of files, OR we use the same logic like the graph repair that will take a first superstep to read all the things and distribute it among the tasks afterwards. I think that the last solution is quite simple. bq.Does anyone know how it is done in Giraph? Don't know, bet on the second solution, since their mapper input isn't very likely to be partitioned. > Data re-partitioning in BSPJobClient > ------------------------------------ > > Key: HAMA-531 > URL: https://issues.apache.org/jira/browse/HAMA-531 > Project: Hama > Issue Type: Improvement > Reporter: Edward J. Yoon > > The re-partitioning the data is a very expensive operation. By the way, > currently, we processes read/write operations sequentially using HDFS api in > BSPJobClient from client-side. This causes potential too many open files > error, contains HDFS overheads, and shows slow performance. > We have to find another way to re-partitioning data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira