[ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709801#action_12709801 ]
Devaraj Das commented on HADOOP-2141: ------------------------------------- Went through the patch. Looks good for the most part. Some comments: 1) I don't think we need to sort tasks on expected time to completion. Sorting on progress-rate should be good enough. On a related note, we will have a fix pretty soon that will smoothen the progress curve pretty soon (HADOOP-5572). 2) In the patch, jobs will always launch speculative tasks if given a chance. We should launch speculative tasks only for the slower tasks. Offline, you were saying that you would implement the std deviation technique and launch spec tasks only for the ones that are lagging behind in their progress rates by 1 standard deviation. We should implement some technique like that. 3) You had commented: bq. It might make more sense to just assume that nodes who haven't reported back progress (regardless if they have been assigned a task for this job or not) are not laggards. I am +1 for this. Please remove the code where you check whether a TT ran a task at all before deciding it is a laggard or not. I guess we can give it the benefit of doubt. In the future, we may consider the overall performance of TTs w.r.t all tasks it has run so far and make some decisions based on that for this special case. 4) The statistics update upon successful completion of a task has a problem - the tip.execStartTime is global to all attempts and is not per attempt. So the tipduration wouldn't be always reflective of the time the current TT took to run the task (take a case where an attempt failed on some other TT and got reexecuted on this TT). > speculative execution start up condition based on completion time > ----------------------------------------------------------------- > > Key: HADOOP-2141 > URL: https://issues.apache.org/jira/browse/HADOOP-2141 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Affects Versions: 0.21.0 > Reporter: Koji Noguchi > Assignee: Andy Konwinski > Attachments: 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, > HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, > HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch > > > We had one job with speculative execution hang. > 4 reduce tasks were stuck with 95% completion because of a bad disk. > Devaraj pointed out > bq . One of the conditions that must be met for launching a speculative > instance of a task is that it must be at least 20% behind the average > progress, and this is not true here. > It would be nice if speculative execution also starts up when tasks stop > making progress. > Devaraj suggested > bq. Maybe, we should introduce a condition for average completion time for > tasks in the speculative execution check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.