[ https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852022#comment-16852022 ]
Ahmed Hussein commented on MAPREDUCE-7208: ------------------------------------------ [~jeagles], [~tgraves], [~vinodkv], [~nroberts] I had some issues using {{ExponentiallySmoothedTaskRuntimeEstimator}}. I made some investigation and implemented a new estimator that addresses some issues with the existing smoothing factor estimator. Do you mind taking a look at the suggested fixes and implementations? *{{SimpleExponentialTaskRuntimeEstimator}} (new) Vs {{ExponentiallySmoothedTaskRuntimeEstimator}} (old)* # New estimator follows Basic Exponential Smooth. # New estimator does not return an estimate for the first few cycles. This increases the accuracy of estimation; especially for long running tasks # New Estimator detects tasks that are slowing down. Old estimator fails to detect such scenarios. # New Estimator detects stalled tasks. Old estimator will not launch any speculative attempts when an attempt has a sharp slow down. *Is the default speculator affected?* * The speculator is still using the {{LegacyTaskRuntimeEstimator}} by default. * The existing implementation uses the statistics.mean to get an {{estimatedNewAttemptRuntime()}}. This causes frequent speculation as the smallest difference between the {{estimatedRuntime}} and the mean will create a new speculativeAttempt. I changed the implementation of {{estimatedNewAttemptRuntime()}} so that it uses (mean + a small delta) * I created a n JUnit {{TestSpeculativeExecOnCluster}} that verifies the speculator running on {{MiniMRYarnCluster}}. The test case can be used for the old estimators. *Tuning parameters:* * {{job.task.estimator.simple.exponential.smooth.lambda-ms}}: The lambda value in the smoothing function of the task estimator * {{job.task.estimator.simple.exponential.smooth.stagnated-ms}}: The window length in the simple exponential smoothing that considers the task attempt is stagnated. This allows the speculator to detect stalled progress. * {{job.task.estimator.simple.exponential.smooth.skip-initials}}: The number of initial readings that the estimator ignores before giving a prediction. A simple smoothing needs several iterations before adjusting and returning good estimates. The skip-initials parameter instructs the estimator to return "no-information" progress updates did not reach that value. > Tuning TaskRuntimeEstimator > ---------------------------- > > Key: MAPREDUCE-7208 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Ahmed Hussein > Assignee: Ahmed Hussein > Priority: Minor > Attachments: MAPREDUCE-7208.001.patch, smoothing-exponential.md > > > By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the > runtime. The estimator does not adjust dynamically to the progress rate of > the tasks. On the other hand, the existing alternative > "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable. > > There are several dimensions to improve the exponential implementation: > # Exponential shooting needs a warmup period. Otherwise, the estimate will > be affected by the initial values. > # Using a single smoothing factor (Lambda) does not work well for all the > tasks. To increase the level of smoothing across the majority of tasks, we > need to give a range of flexibility to dynamically adjust the smoothing > factor based on the history of the task progress. > # Design wise, it is better to separate between the statistical model and > the MR interface. We need to have a way to evaluate estimators statistically, > without the need to run MR. For example, an estimator can be evaluated as a > black box by using a stream of raw data as input and testing the accuracy of > the generated stream of estimates. > # The exponential estimator speculates frequently and fails to detect > slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt > that does not do any progress won't trigger a new speculation. > > The file [^smoothing-exponential.md] describes how Simple Exponential > smoothing factor works. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org