[ 
https://issues.apache.org/jira/browse/HADOOP-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610455#action_12610455
 ] 

amar_kamat edited comment on HADOOP-3687 at 7/3/08 10:48 PM:
-------------------------------------------------------------

Chris,
We been thinking on this for sometime. This problem will be more visible once 
the new scheduler comes in since that will have the pre-emption feature. 

>From our offline discussions it makes more sense to _suspend/resume_ *reduce* 
>tasks. Since on an average the reducers run for longer time and mostly 
>determine the job runtime. Its also easier to suspend reducers as one can 
>always save the shuffled data and restart the _REDUCE_ phase. Saving shuffle 
>data might be a huge gain but again there are issues with resources getting 
>wasted and clean-up. 

With maps its difficult since mostly the maps run faster and the maps have just 
one phase i.e the MAP phase. When the map task runs following are the things 
that determines its state
1) The offset in the input that is read
2) The mapped <k,v> in the memory
3) The data spilled to disk
4) External connections 
One could probably optimise by using what is already spilled and move to the 
offset on restart/resume but its not clear how much gain this will give and if 
at all there are any use-cases that strongly demand this. 

Holding tasks in memory (i.e the pause) might not be scalable. Hence we should 
think on suspend-to-fs/resume. As rightly pointed out by Vivek (offline) that 
its not guaranteed that the job/org will get the same set of nodes back and 
hence saving this state on disk might not make sense. Saving to dfs will be a 
huge hit. Thoughts? Comments?

      was (Author: amar_kamat):
    Chris,
We been thinking on this for sometime. This problem will be more visible once 
the new scheduler comes in since that will have the pre-emption feature. From 
our offline discussions it makes more sense to _suspend/resume_ *reduce* tasks. 
Since on an average the reducers run for longer time and mostly determine the 
job runtime. Its also easier to suspend reducers as one can always save the 
shuffled data and restart the _REDUCE_ phase. Saving shuffle data might be a 
huge gain but again there are issues with resources getting wasted and 
clean-up. With maps its difficult since mostly the maps run faster and the maps 
have just one phase i.e the MAP phase. When the map task runs following are the 
things that determines its state
1) The offset in the input that is read
2) The mapped <k,v> in the memory
3) The data spilled to disk
4) External connections 
One could probably optimise by using what is already spilled and move to the 
offset on restart/resume but its not clear how much gain this will give and if 
at all there are any use-cases that strongly demand this. Holding tasks in 
memory (i.e the pause) might now be scalable. Hence we should think on 
suspend-to-fs/resume. As rightly pointed out by Vivek (offline) that its not 
guaranteed that the job/org will get the same set of nodes back and hence 
saving this state on FileSystem might not make sense. Thoughts? Comments?
  
> Ability to pause/resume tasks
> -----------------------------
>
>                 Key: HADOOP-3687
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3687
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Chris Smith
>            Priority: Minor
>
> It would be nice to be able to pause (and subsequently resume) tasks that are 
> currently running, in order to allow tasks from higher priority jobs to 
> execute. At present it is quite easy for long-running tasks from low priority 
> jobs to block a task from a newer high priority job, and there is no way to 
> force the execution of the high priority task without killing the low 
> priority jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to