[
https://issues.apache.org/jira/browse/HADOOP-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610455#action_12610455
]
Amar Kamat commented on HADOOP-3687:
------------------------------------
Chris,
We been thinking on this for sometime. This problem will be more visible once
the new scheduler comes in since that will have the pre-emption feature. From
our offline discussions it makes more sense to _suspend/resume_ *reduce* tasks.
Since on an average the reducers run for longer time and mostly determine the
job runtime. Its also easier to suspend reducers as one can always save the
shuffled data and restart the _REDUCE_ phase. Saving shuffle data might be a
huge gain but again there are issues with resources getting wasted and
clean-up. With maps its difficult since mostly the maps run faster and the maps
have just one phase i.e the MAP phase. When the map task runs following are the
things that determines its state
1) The offset in the input that is read
2) The mapped <k,v> in the memory
3) The data spilled to disk
4) External connections
One could probably optimise by using what is already spilled and move to the
offset on restart/resume but its not clear how much gain this will give and if
at all there are any use-cases that strongly demand this. Holding tasks in
memory (i.e the pause) might now be scalable. Hence we should think on
suspend-to-fs/resume. As rightly pointed out by Vivek (offline) that its not
guaranteed that the job/org will get the same set of nodes back and hence
saving this state on FileSystem might not make sense. Thoughts? Comments?
> Ability to pause/resume tasks
> -----------------------------
>
> Key: HADOOP-3687
> URL: https://issues.apache.org/jira/browse/HADOOP-3687
> Project: Hadoop Core
> Issue Type: New Feature
> Components: mapred
> Reporter: Chris Smith
> Priority: Minor
>
> It would be nice to be able to pause (and subsequently resume) tasks that are
> currently running, in order to allow tasks from higher priority jobs to
> execute. At present it is quite easy for long-running tasks from low priority
> jobs to block a task from a newer high priority job, and there is no way to
> force the execution of the high priority task without killing the low
> priority jobs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.