[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930867#action_12930867
 ] 

Adam Kramer commented on MAPREDUCE-1608:
----------------------------------------

As a user, I have found the ability to manually speculate tasks via the website 
incredibly useful--so useful that I'm starting to worry about RSI given that 
each speculation takes a click to the task page, a click to the task, a click 
on speculate, and a click on the confirm dialog box. These are frequently 
lost-task-tracker failures, and Hadoop currently just sets a timeout on them.

But how am I beating the current system? I'm comparing some tasks' performance 
to other tasks in the same job:

1) If there is only one task (either map or reduce) always speculate. Maybe 
turn this off for clusters that have very few slots, but in the case of >1000 
slots or so, this is trivial and would basically prevent jobs taking literally 
twice as long.

2) Collect data on other tasks in the same job. If 99% of mappers went from 0% 
complete to >0% complete in 5 seconds and it's been 5 minutes while the last 5% 
of mappers change, speculate them. Ditto reducers. Unbalanced data may cause 
these problems, 

3) Collect data on delays. If a task doesn't improve its % complete in some 
timeframe determined by the other tasks for the same job, speculate the "hung" 
task.

...in other words, I agree that there is probably an easy way to model the 
failed tasks, but only from a modeling perspective. Getting the heuristics and 
models right and implementing them is probably much much more difficult than 
implemeting "hadoop job -speculate-task task_identifier_here."

But also, and implementing the latter is *necessary* to discover how and when 
the heuristics themselves are failing...giving users the ability to do this 
also gives admins the ability to see when users are doing this.

> Allow users to do speculative execution of a task manually
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-1608
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1608
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>
> Speculative execution improves the latency of the job. Sometimes the job has 
> few very slow reducers. Spending a little more resource on speculative tasks 
> can improve the latency a lot. It will be nice that the users can manually 
> select one task and force the speculative execution on that task just like we 
> can manually kill/fail task.
> The proposal is add link says "speculate" in taskdetails.jsp page where we do 
> "kill/fail".
> Thoughts? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to