[
https://issues.apache.org/jira/browse/HAMA-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462439#comment-13462439
]
Yuesheng Hu commented on HAMA-647:
----------------------------------
hi edward,
The patch didn't fxi everything.
{code}
else if (files.length == 1) {
goalSize = totalSize / (numSplits == 0 ? 1 : numSplits - 1);
} else {
goalSize = totalSize
/ (numSplits == 0 ? 1 : numSplits - files.length / 2 + 1);
}
LOG.debug("numSplits: " + numSplits);
{code}
*When there are too many input files, goalSize may be negative!*
I will keep looking forward a alogrithm to solve this problem.
Btw, the ideal number of input files is only 1 or equal to the
numSplits(_setTasks_ or _taskCapacity_), this is good for load balance, but
reduces the felxibility. This rule should be written into tutorial.
> Make the input spliter robustly
> --------------------------------
>
> Key: HAMA-647
> URL: https://issues.apache.org/jira/browse/HAMA-647
> Project: Hama
> Issue Type: Improvement
> Components: bsp core
> Affects Versions: 0.5.0, 0.6.0
> Reporter: Yuesheng Hu
> Assignee: Yuesheng Hu
> Priority: Critical
> Fix For: 0.6.0
>
> Attachments: HAMA-647-2.patch, HAMA-647.patch
>
>
> Currently, the spliter in FileInputFormat is based on the Mapreduce's
> spliter. But, Hama is different from Mapreduce, Hama's task can not be
> pended until the slot becomes free. So, the current spliter is not suitable
> for Hama. When input file is small, it may be ok, but when input is very
> large, the number of splits will be very large too, even our cluster is
> powerful enough to handle the input. More details, please see the comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira