[jira] Commented: (HADOOP-2141) speculative execution start up condition based on completion time

Devaraj Das (JIRA) Tue, 12 May 2009 23:19:12 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708781#action_12708781
 ]


Devaraj Das commented on HADOOP-2141:
-------------------------------------

I am going through the patch. Some early comments:
1) I don't understand the motivation of having two time fields - dispatchTime 
and mostRecentStartTime. Seems like both of them is updated in the same code 
flow - mostRecentStartTime is updated in TaskInProgress.getTaskToRun and 
dispatchTime is updated in the place just after assignTasks in JobTracker. But 
getTaskToRun is anyway called from within assignTasks .. so why have two fields 
representing the same information 
2) The locality code seems quite redundant actually. The locality aspect 
actually conflicts with the algorithm for choosing tasks to speculate. In the 
current codebase (unpatched), we get the running tasks list based on locality 
w.r.t the tracker that just came in asking for a task, and then see if 
something can be speculatively run. In the patch, *all* running tasks are 
sorted globally w.r.t progress rates and expected-time-to-completion and a task 
from that list is handed out. Locality could only be a coincidence here at 
best. I will ponder some more whether to leave that code around or simplify it 
to remove the locality aspects for running tasks.
Now, coming to Eric's concern about a slow disk slowing the progress of a task, 
if the speculative task also starts reading input from the same replica, then 
yes, there is a problem. So yes, this is an interesting area for further 
research!

> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, 
> HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, 
> HADOOP-2141.patch, HADOOP-2141.v7.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative 
> instance of a task is that it must be at least 20% behind the average 
> progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop 
> making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for 
> tasks in the speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2141) speculative execution start up condition based on completion time

Reply via email to