Ahmed Hussein created MAPREDUCE-7208:
----------------------------------------
Summary: Tuning TaskRuntimeEstimator
Key: MAPREDUCE-7208
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Ahmed Hussein
Assignee: Ahmed Hussein
Attachments: smoothing-exponential.md
By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the
runtime. The estimator does not adjust dynamically to the progress rate of the
tasks. On the other hand, the existing alternative
"ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
There are several dimensions to improve the exponential implementation:
# Exponential shooting needs a warmup period. Otherwise, the estimate will be
affected by the initial values.
# Using a single smoothing factor (Lambda) does not work well for all the
tasks. To increase the level of smoothing across the majority of tasks, we need
to give a range of flexibility to dynamically adjust the smoothing factor based
on the history of the task progress.
# Design wise, it is better to separate between the statistical model and the
MR interface. We need to have a way to evaluate estimators statistically,
without the need to run MR. For example, an estimator can be evaluated as a
black box by using a stream of raw data as input and testing the accuracy of
the generated stream of estimates.
# The exponential estimator speculates frequently and fails to detect slowing
tasks. It does not detect slowing tasks. As a result, a taskAttempt that does
not do any progress won't trigger a new speculation.
The file [^smoothing-exponential.md] describes how Simple Exponential smoothing
factor works.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]