[
https://issues.apache.org/jira/browse/MAPREDUCE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930867#action_12930867
]
Adam Kramer commented on MAPREDUCE-1608:
----------------------------------------
As a user, I have found the ability to manually speculate tasks via the website
incredibly useful--so useful that I'm starting to worry about RSI given that
each speculation takes a click to the task page, a click to the task, a click
on speculate, and a click on the confirm dialog box. These are frequently
lost-task-tracker failures, and Hadoop currently just sets a timeout on them.
But how am I beating the current system? I'm comparing some tasks' performance
to other tasks in the same job:
1) If there is only one task (either map or reduce) always speculate. Maybe
turn this off for clusters that have very few slots, but in the case of >1000
slots or so, this is trivial and would basically prevent jobs taking literally
twice as long.
2) Collect data on other tasks in the same job. If 99% of mappers went from 0%
complete to >0% complete in 5 seconds and it's been 5 minutes while the last 5%
of mappers change, speculate them. Ditto reducers. Unbalanced data may cause
these problems,
3) Collect data on delays. If a task doesn't improve its % complete in some
timeframe determined by the other tasks for the same job, speculate the "hung"
task.
...in other words, I agree that there is probably an easy way to model the
failed tasks, but only from a modeling perspective. Getting the heuristics and
models right and implementing them is probably much much more difficult than
implemeting "hadoop job -speculate-task task_identifier_here."
But also, and implementing the latter is *necessary* to discover how and when
the heuristics themselves are failing...giving users the ability to do this
also gives admins the ability to see when users are doing this.
> Allow users to do speculative execution of a task manually
> ----------------------------------------------------------
>
> Key: MAPREDUCE-1608
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1608
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Scott Chen
> Assignee: Scott Chen
>
> Speculative execution improves the latency of the job. Sometimes the job has
> few very slow reducers. Spending a little more resource on speculative tasks
> can improve the latency a lot. It will be nice that the users can manually
> select one task and force the speculative execution on that task just like we
> can manually kill/fail task.
> The proposal is add link says "speculate" in taskdetails.jsp page where we do
> "kill/fail".
> Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.