[
https://issues.apache.org/jira/browse/HAMA-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461851#comment-13461851
]
Edward J. Yoon commented on HAMA-647:
-------------------------------------
{code}
protected long computeSplitSize(long goalSize, long minSize, long blockSize)
{
- return Math.max(minSize, Math.min(goalSize, blockSize));
+ if (goalSize > blockSize) {
+ return Math.max(minSize, Math.max(goalSize, blockSize));
+ } else {
+ return Math.max(minSize, Math.min(goalSize, blockSize));
+ }
{code}
This is good catch.
By the way,
{code}
@@ -214,9 +215,13 @@
}
}
return splits.toArray(new FileSplit[splits.size()]);
+ } else if (files.length == 1) {
+ goalSize = totalSize / (numSplits == 0 ? 1 : numSplits - 1);
{code}
If files.length == 1 and numSplits == 1, java will throw ArithmeticException.
∵ numSplits - 1 equals zero, correct?
> Make the input spliter robustly
> --------------------------------
>
> Key: HAMA-647
> URL: https://issues.apache.org/jira/browse/HAMA-647
> Project: Hama
> Issue Type: Improvement
> Components: bsp core
> Affects Versions: 0.5.0, 0.6.0
> Reporter: Yuesheng Hu
> Assignee: Yuesheng Hu
> Priority: Critical
> Fix For: 0.6.0
>
> Attachments: HAMA-647-2.patch, HAMA-647.patch
>
>
> Currently, the spliter in FileInputFormat is based on the Mapreduce's
> spliter. But, Hama is different from Mapreduce, Hama's task can not be
> pended until the slot becomes free. So, the current spliter is not suitable
> for Hama. When input file is small, it may be ok, but when input is very
> large, the number of splits will be very large too, even our cluster is
> powerful enough to handle the input. More details, please see the comments.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira