[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852022#comment-16852022
 ] 

Ahmed Hussein commented on MAPREDUCE-7208:
------------------------------------------

 

[~jeagles], [~tgraves], [~vinodkv], [~nroberts]

I had some issues using {{ExponentiallySmoothedTaskRuntimeEstimator}}. I made 
some investigation and implemented a new estimator that addresses some issues 
with the existing smoothing factor estimator. Do you mind taking a look at the 
suggested fixes and implementations?

 

 *{{SimpleExponentialTaskRuntimeEstimator}} (new) Vs 
{{ExponentiallySmoothedTaskRuntimeEstimator}} (old)*
 # New estimator follows Basic Exponential Smooth.
 # New estimator does not return an estimate for the first few cycles. This 
increases the accuracy of estimation; especially for long running tasks
 # New Estimator detects tasks that are slowing down. Old estimator fails to 
detect such scenarios.
 # New Estimator detects stalled tasks. Old estimator will not launch any 
speculative attempts when an attempt has a sharp slow down.

*Is the default speculator affected?*
 * The speculator is still using the {{LegacyTaskRuntimeEstimator}} by default.
 * The existing implementation uses the statistics.mean to get an 
{{estimatedNewAttemptRuntime()}}. This causes frequent speculation as the 
smallest difference between the {{estimatedRuntime}} and the mean will create a 
new speculativeAttempt. I changed the implementation of 
{{estimatedNewAttemptRuntime()}} so that it uses (mean + a small delta)
 * I created a n JUnit {{TestSpeculativeExecOnCluster}} that verifies the 
speculator running on {{MiniMRYarnCluster}}. The test case can be used for the 
old estimators.

*Tuning parameters:*
 * {{job.task.estimator.simple.exponential.smooth.lambda-ms}}: The lambda value 
in the smoothing function of the task estimator
 * {{job.task.estimator.simple.exponential.smooth.stagnated-ms}}: The window 
length in the simple exponential smoothing that considers the task attempt is 
stagnated. This allows the speculator to detect stalled progress.
 * {{job.task.estimator.simple.exponential.smooth.skip-initials}}: The number 
of initial readings that the estimator ignores before giving a prediction. A 
simple smoothing needs several iterations before adjusting and returning good 
estimates.  The skip-initials parameter instructs the estimator to return 
"no-information" progress updates did not reach that value.

 

 

> Tuning TaskRuntimeEstimator 
> ----------------------------
>
>                 Key: MAPREDUCE-7208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Minor
>         Attachments: MAPREDUCE-7208.001.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the 
> runtime.  The estimator does not adjust dynamically to the progress rate of 
> the tasks. On the other hand, the existing alternative 
> "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will 
> be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the 
> tasks. To increase the level of smoothing across the majority of tasks, we 
> need to give a range of flexibility to dynamically adjust the smoothing 
> factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and 
> the MR interface. We need to have a way to evaluate estimators statistically, 
> without the need to run MR. For example, an estimator can be evaluated as a 
> black box by using a stream of raw data as input and testing the accuracy of 
> the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect 
> slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt 
> that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential 
> smoothing factor works.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to