[ 
https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HADOOP-2141:
--------------------------------

    Attachment: 2141.4.patch

The attached patch fixes some bugs that was there in the earlier patch. Also, 
the standard deviation of a task progress is what is used for determining 
whether to speculate or not. I also wrote up a unit test that fakes 
JobInProgress and the Clock. The first is required so that task scheduling is 
transparent to the testcase although it uses the framework's lower level 
methods. This keeps things really very close to reality, and at the same time 
the testcase can make equality checks for the taskAttemptID that it gets back 
when it asks for one from the fake JobInProgress. The Clock is faked so that 
tasks can be made to progress over time artificially and things like progress 
rates can be easily computed for testing purposes. The other option was to use 
a MiniMRCluster but it seems like it would not be easy to achieve what i have 
in the testcase easily. The third option was to spoof heartbeats and not fake 
the JobInProgress but that also seemed not easily manageable.. 

> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: 2141.4.patch, 2141.patch, HADOOP-2141-v2.patch, 
> HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, 
> HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, 
> HADOOP-2141.v8.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative 
> instance of a task is that it must be at least 20% behind the average 
> progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop 
> making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for 
> tasks in the speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to