[ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Devaraj Das updated HADOOP-2141: -------------------------------- Attachment: 2141.4.patch The attached patch fixes some bugs that was there in the earlier patch. Also, the standard deviation of a task progress is what is used for determining whether to speculate or not. I also wrote up a unit test that fakes JobInProgress and the Clock. The first is required so that task scheduling is transparent to the testcase although it uses the framework's lower level methods. This keeps things really very close to reality, and at the same time the testcase can make equality checks for the taskAttemptID that it gets back when it asks for one from the fake JobInProgress. The Clock is faked so that tasks can be made to progress over time artificially and things like progress rates can be easily computed for testing purposes. The other option was to use a MiniMRCluster but it seems like it would not be easy to achieve what i have in the testcase easily. The third option was to spoof heartbeats and not fake the JobInProgress but that also seemed not easily manageable.. > speculative execution start up condition based on completion time > ----------------------------------------------------------------- > > Key: HADOOP-2141 > URL: https://issues.apache.org/jira/browse/HADOOP-2141 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Affects Versions: 0.21.0 > Reporter: Koji Noguchi > Assignee: Andy Konwinski > Attachments: 2141.4.patch, 2141.patch, HADOOP-2141-v2.patch, > HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, > HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, > HADOOP-2141.v8.patch > > > We had one job with speculative execution hang. > 4 reduce tasks were stuck with 95% completion because of a bad disk. > Devaraj pointed out > bq . One of the conditions that must be met for launching a speculative > instance of a task is that it must be at least 20% behind the average > progress, and this is not true here. > It would be nice if speculative execution also starts up when tasks stop > making progress. > Devaraj suggested > bq. Maybe, we should introduce a condition for average completion time for > tasks in the speculative execution check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.