Zhizhen Hou created MAPREDUCE-7081:
--------------------------------------
Summary: Default speculator won't sepculate the last several
submitted reduced task if the total task num is large
Key: MAPREDUCE-7081
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7081
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: mrv2
Affects Versions: 2.7.5, 2.9.0
Reporter: Zhizhen Hou
DefaultSpeculator speculates a task one time. By default, the number of
speculators is max(max(10, 0.01 * tasks.size), 0.1 * running tasks).
I set mapreduce.job.reduce.slowstart.completedmaps = 1 to start reduce after
all the map tasks are finished. The cluster has 1000 vcores, and the Job has
5000 reduce jobs. At first, 1000 reduces tasks can run simultaneously, number
of speculators can speculator at most is 0.1 * 1000 = 100 tasks. Reduce tasks
with less data can over shortly, and speculator will speculator a task per
second by default. The task be speculated execution may be because the more
data to be processed. It will speculator 100 tasks within 100 seconds. When
4900 reduces is over, If a reduce is executed with a lot of data be processed
and is put on a slow machine. The speculate opportunity is running out, it will
not be speculated. It can increase the execution time of job significantly.
In short, it may waste the speculate opportunity at first only because the
execution time of reduce with less data to be processed as average time. At
end of job, there is no speculate opportunity available, especially last
several running tasks, judged the number of the running tasks .
In my opinion, the number of running tasks should not determine the number of
speculate opportunity .The number of tasks be speculated can be judged by
square of finished task percent. Take an example, if ninety percent of the
task is finished, only 0.9*0.9 = 0.81 speculate opportunity can be used. It
will leave enough opportunity for latter tasks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]