speculative execution does not handle cases where stddev > mean well
--------------------------------------------------------------------
Key: MAPREDUCE-2162
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2162
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: Joydeep Sen Sarma
the new speculation code only speculates tasks whose progress rate deviates
from the mean progress rate of a job by more than some multiple (typically 1.0)
of stddev. stddev can be larger than mean. which means that if we ever get into
a situation where this condition holds true - then a task with even 0 progress
rate will not be speculated.
it's not clear that this condition is self-correcting. if a job has thousands
of tasks - then one laggard task, inspite of not being speculated for a long
time, may not be able to fix the condition of stddev > mean.
we have seen jobs where tasks have not been speculated for hours and this seems
one explanation why this may have happened. here's an example job with stddev >
mean:
DataStatistics: count is 6, sum is 1.7141054797775723E-8, sumSquares is
2.9381575958035014E-16 mean is 2.8568424662959537E-9 std() is
6.388093955645905E-9
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.