Zhizhen Hou created MAPREDUCE-7080:

             Summary: Default speculator won't sepculate the last several 
submitted reduced task if the total task num is large
                 Key: MAPREDUCE-7080
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7080
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2
    Affects Versions: 2.7.5
            Reporter: Zhizhen Hou

DefaultSpeculator speculates a task one time. 

By default, the number of speculators is max(max(10, 0.01 * tasks.size), 0.1 * 
running tasks)

I  set mapreduce.job.reduce.slowstart.completedmaps = 1 to start reduce after 
all the map tasks are finished.

The cluster has 1000 vcores, and the Job has 5000 reduce jobs.

At first, 1000 reduces tasks can run simultaneously, number of speculators can 
speculator at most is 0.1 * 1000 = 100 tasks. Reduce tasks with less data can 
over shortly, and speculator will speculator a task per second by default. The 
task be speculated execution may be because the more data to be processed. It 
will speculator  100 tasks within 100 seconds.

When 4900 reduces is over, If a reduce is executed with a lot of  data be 
processed and is put on a slow machine. The speculate opportunity is running 
out, it will not be speculated. It can increase the execution time of job 

In short, it may waste the speculate opportunity at first only because the 
execution time of  reduce with less data to be processed as average time. At  
end of job, there is no speculate opportunity available, especially last 
several running tasks, judged the number of the running tasks .


In my opinion, the number of tasks be speculated can be judged by square of 
finished task percent. Take an example, if ninety percent of  the task is 
finished, only 0.9*0.9 = 0.81 speculate opportunity can be used. It will leave 
enough opportunity for latter tasks.


This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to