[ 
https://issues.apache.org/jira/browse/HAMA-647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462439#comment-13462439
 ] 

Yuesheng Hu commented on HAMA-647:
----------------------------------

hi edward, 
The patch didn't fxi everything. 
{code}
else if (files.length == 1) {
      goalSize = totalSize / (numSplits == 0 ? 1 : numSplits - 1);
    } else {
      goalSize = totalSize
          / (numSplits == 0 ? 1 : numSplits - files.length / 2 + 1);
    }
    LOG.debug("numSplits: " + numSplits); 
{code}

*When there are too many input files, goalSize may be negative!*
I will keep looking forward a alogrithm to solve this problem.
Btw, the ideal number of input files is only 1 or equal to the 
numSplits(_setTasks_ or _taskCapacity_), this is good for load balance, but 
reduces the felxibility. This rule should be written into tutorial.
                
> Make the  input spliter robustly
> --------------------------------
>
>                 Key: HAMA-647
>                 URL: https://issues.apache.org/jira/browse/HAMA-647
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp core
>    Affects Versions: 0.5.0, 0.6.0
>            Reporter: Yuesheng Hu
>            Assignee: Yuesheng Hu
>            Priority: Critical
>             Fix For: 0.6.0
>
>         Attachments: HAMA-647-2.patch, HAMA-647.patch
>
>
> Currently, the spliter in FileInputFormat is based on the Mapreduce's 
> spliter. But, Hama is different from Mapreduce, Hama's task can not be  
> pended until the slot becomes free.  So, the current spliter is not suitable 
> for Hama. When input file is small, it may be ok, but when input is  very 
> large, the number of splits will be very large too, even our cluster is 
> powerful enough to handle the input. More details, please see the comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to