[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou resolved MAPREDUCE-7081.
------------------------------------
    Resolution: Invalid

> Default speculator won't speculate the last several submitted reduced task if 
> the total task num is large
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7081
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7081
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 2.9.0, 2.7.5
>            Reporter: Zhizhen Hou
>            Priority: Major
>
> DefaultSpeculator speculates a task one time.  By default, the number of 
> speculators is max(max(10, 0.01 * tasks.size), 0.1 * running tasks).
> I  set mapreduce.job.reduce.slowstart.completedmaps = 1 to start reduce after 
> all the map tasks are finished. The cluster has 1000 vcores, and the Job has 
> 5000 reduce jobs. At first, 1000 reduces tasks can run simultaneously, number 
> of speculators can speculator at most is 0.1 * 1000 = 100 tasks. Reduce tasks 
> with less data can over shortly, and speculator will speculator a task per 
> second by default. The task be speculated execution may be because the more 
> data to be processed. It will speculator  100 tasks within 100 seconds. When 
> 4900 reduces is over, If a reduce is executed with a lot of  data be 
> processed and is put on a slow machine. The speculate opportunity is running 
> out, it will not be speculated. It can increase the execution time of job 
> significantly.
> In short, it may waste the speculate opportunity at first only because the 
> execution time of  reduce with less data to be processed as average time. At  
> end of job, there is no speculate opportunity available, especially last 
> several running tasks, judged the number of the running tasks .  
> In my opinion, the number of running tasks should not determine the number of 
> speculate opportunity .The number of tasks be speculated can be judged by 
> square of finished task percent. Take an example, if ninety percent of  the 
> task is finished, only 0.9*0.9 = 0.81 speculate opportunity can be used. It 
> will leave enough opportunity for latter tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to