[
https://issues.apache.org/jira/browse/MAPREDUCE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907476#action_12907476
]
Hong Tang commented on MAPREDUCE-2039:
--------------------------------------
There would be a lot of heuristics involved in the tuning of speculative
execution. Should we have a benchmark that would reliably and quantitatively
measure the effectiveness of speculative execution?
> Improve speculative execution
> -----------------------------
>
> Key: MAPREDUCE-2039
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2039
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Reporter: Dick King
> Assignee: Dick King
>
> In speculation, the framework issues a second task attempt on a task where
> one attempt is already running. This is useful if the running attempt is
> bogged down for reasons outside of the task's code, so a second attempt
> finishes ahead of the existing attempt, even though the first attempt has a
> head start.
> Early versions of speculation had the weakness that an attempt that starts
> out well but breaks down near the end would never get speculated. That got
> fixed in HADOOP:2141 , but in the fix the speculation wouldn't engage until
> the performance of the old attempt, _even counting the early portion where it
> progressed normally_ , was significantly worse than average.
> I want to fix that by overweighting the more recent progress increments. In
> particular, I would like to use exponential smoothing with a lambda of
> approximately 1/minute [which is the time scale of speculative execution] to
> measure progress per unit time. This affects the speculation code in two
> places:
> * It affects the set of task attempts we consider to be underperforming
> * It affects our estimates of when we expect tasks to finish. This could
> be hugely important; speculation's main benefit is that it gets a single
> outlier task finished earlier than otherwise possible, and we need to know
> which task is the outlier as accurately as possible.
> I would like a rich suite of configuration variables, minimally including
> lambda and possibly weighting factors. We might have two exponentially
> smoothed tracking variables of the progress rate, to diagnose attempts that
> are bogged down and getting worse vrs. bogging down but improving.
> Perhaps we should be especially eager to speculate a second attempt. If a
> task is deterministically failing after bogging down [think "rare infinite
> loop bug"] we would rather take a couple of our attempts in parallel to
> discover the problem sooner.
> As part of this patch we would like to add benchmarks that simulate rare
> tasks that behave poorly, so we can discover whether this change in the code
> is a good idea and what the proper configuration is. Early versions of this
> will be driven by our assumptions. Later versions will be driven by the
> fruits of MAPREDUCE:2037
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.