[ 
https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709729#action_12709729
 ] 

Andy Konwinski edited comment on HADOOP-2141 at 5/14/09 11:14 PM:
------------------------------------------------------------------

Responding to Devaraj's comments:

re 1) You are right, they were redundant as far as I can tell. I have removed 
the mostRecentStartTime and am now only using dispatchTime. It is now updated 
in TaskInProgress.getTaskToRun(), not JobTracker.assignTasks().
 
re 2) Devaraj, what you are saying makes sense about locality, and I think we 
need to think about this a bit more, but I want to get this patch submitted 
with the changes and bug fixes I have done now.

Also, some other comments:

A)  I have updated isSlowTracker() to better handle the case where a task 
tracker hasn't successfully completed a task for this job yet. In the last 
patch (v8) I was just assuming that it was a laggard in such cases to be safe. 
Now I am checking if the TT has been assigned a task for this job or not yet. 
If it hasn't then we give it the benefit of the doubt, if it has been assigned 
a task but hasn't finished the task yet then we don't speculate on it. This 
should address the case Deveraj pointed out earlier of running in a cluster 
that has more nodes than we have tasks or adding at task tracker during the 
middle of a long job. It might make more sense to just assume that nodes who 
haven't reported back progress (regardless if they have been assigned a task 
for this job or not) are not laggards.

B) Finally, Devaraj caught two very serious bugs in my math in isSlowTracker. 
My current implementation of DataStatistics.std() calculates the variance, not 
the standard deviation. I should have been using the square root of my formula. 
Also, I was considering trackers with faster tasks to be the laggards, it 
should obviously be trackers with slower tasks that are considered the laggards.

Walking through an example (given by Devaraj):

2 trackers runs 3 maps each. TT1 takes 1 second to run each map. TT2 takes 2 
seconds to run each map. Assuming these figures, let's compute 
mapTaskStats.mean() and mapTaskStats.std(), and, TT1.mean()/std(). Now if you 
assume that TT1 comes asking for a task, TT1 will be declared as slow. That 
should not happen.

The mapTaskStats.mean() would be 1.5 at the end of the 6 tasks. 
MapTaskStats.std() would be 0.25 (2.5 - 1.5*1.5). TT1's mean() would be 1. The 
check in isSlowTracker is would evaluate to true since (1 < (1.5 - 0.25))  
(assuming slowNodeThreshold is 1). This is obviously wrong.
--

After fixing the bugs, for the numbers above, neither tracker would be 
considered a laggard:

mapTaskStats.mean() = (1+1+1+2+2+2)/6 = 1.5

mapTaskStats.sumSquares = (1^2 + 1^2 + 1^2 + 2^2 + 2^2 + 2^2) = 15
mapTaskStats.std() =  (sumSquares/6 - mean*mean) ^(1/2) = (15/6 - 1.5*1.5) 
^(1/2) = (0.25)^(1/2) = (0.5)

Now since we are using the default one standard deviation, we expect that no 
more than 1/2 of the tasks will be considered slow. This is shown by the 
One-sided Chebyshev inequality 
(http://en.wikipedia.org/w/index.php?title=Chebyshev%27s_inequality#Variant:_One-sided_Chebyshev_inequality)

Now, we consider a task tracker to be slow if (tracker's task mean - 
mapTaskStats.mean > maptaskStats.std * slowNodeThreshold).

* for TT1: (tt1.mean - mapTaskStats.mean > mapTaskStats.std) == (1 - 1.5 > 0.5) 
== (-0.5 > 0.5) == false
* for TT2: (tt2.mean - mapTaskStats.mean > mapTaskStats.std) == (2 - 1.5 > 0.5) 
== (0.5 > 0.5) == false

      was (Author: andyk):
    Responding to Devaraj's comments:

re 1) You are right, they were redundant as far as I can tell. I have removed 
the mostRecentStartTime and am now only using dispatchTime. It is now updated 
in TaskInProgress.getTaskToRun(), not JobTracker.assignTasks().
 
re 2) Devaraj, what you are saying makes sense about locality, and I think we 
need to think about this a bit more, but I want to get this patch submitted 
with the changes and bug fixes I have done now.

Also, some other comments:

A)  I have updated isSlowTracker() to better handle the case where a task 
tracker hasn't successfully completed a task for this job yet. In the last 
patch (v8) I was just assuming that it was a laggard in such cases to be safe. 
Now I am checking if the TT has been assigned a task for this job or not yet. 
If it hasn't then we give it the benefit of the doubt, if it has been assigned 
a task but hasn't finished the task yet then we don't speculate on it. This 
should address the case Deveraj pointed out earlier of running in a cluster 
that has more nodes than we have tasks or adding at task tracker during the 
middle of a long job. It might make more sense to just assume that nodes who 
haven't reported back progress (regardless if they have been assigned a task 
for this job or not) are not laggards.

B) Finally, Devaraj caught two very serious bugs in my math in isSlowTracker. 
My current implementation of DataStatistics.std() calculates the variance, not 
the standard deviation. I should have been using the square root of my formula. 
Also, I was considering trackers with faster tasks to be the laggards, it 
should obviously be trackers with slower tasks that are considered the laggards.

Walking through an example (given by Devaraj):

2 trackers runs 3 maps each. TT1 takes 1 second to run each map. TT2 takes 2 
seconds to run each map. Assuming these figures, let's compute 
mapTaskStats.mean() and mapTaskStats.std(), and, TT1.mean()/std(). Now if you 
assume that TT1 comes asking for a task, TT1 will be declared as slow. That 
should not happen.

The mapTaskStats.mean() would be 1.5 at the end of the 6 tasks. 
MapTaskStats.std() would be 0.25 (2.5 - 1.5*1.5). TT1's mean() would be 1. The 
check in isSlowTracker is would evaluate to true since (1 < (1.5 - 0.25))  
(assuming slowNodeThreshold is 1). This is obviously wrong.
--

After fixing the bugs, for the numbers above, neither tracker would be 
considered a laggard:

mapTaskStats.mean() = (1+1+1+2+2+2)/6 = 1.5

mapTaskStats.sumSquares = (1^2 + 1^2 + 1^2 + 2^2 + 2^2 + 2^2) = 15
mapTaskStats.std() =  (sumSquares/6 - mean*mean)^(1/2) = (15/6 - 1.5*1.5) 
^(1/2) = (0.25)^(1/2) = (0.5)

Now since we are using the default one standard deviation, we expect that no 
more than 1/2 of the tasks will be considered slow. This is shown by the 
One-sided Chebyshev inequality 
(http://en.wikipedia.org/w/index.php?title=Chebyshev%27s_inequality#Variant:_One-sided_Chebyshev_inequality)

Now, we consider a task tracker to be slow if (tracker's task mean - 
mapTaskStats.mean > maptaskStats.std * slowNodeThreshold).

* for TT1: (tt1.mean - mapTaskStats.mean > mapTaskStats.std) == (1 - 1.5 > 0.5) 
== (-0.5 > 0.5) == false
* for TT2: (tt2.mean - mapTaskStats.mean > mapTaskStats.std) == (2 - 1.5 > 0.5) 
== (0.5 > 0.5) == false
  
> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, 
> HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, 
> HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative 
> instance of a task is that it must be at least 20% behind the average 
> progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop 
> making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for 
> tasks in the speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to