On Wednesday 16 July 2008 15:41:53 Murali Krishna wrote: > Hi, > > I have to run a small MR job while there is a bigger job already > running. The first job takes around 20 hours to finish and the second 1 > hour. The second job will be given a higher priority. The problem here > is that the first set of reducers of job1 will be occupying all the > slots and will wait till the completion of all maps of first job. So, > even though the maps of second job got scheduled in between and > completed long back, the job2's reducers won't be scheduled till the > first set of reducers of job1 completes. > > Is there a way to preempt the initial set of reducers of > job1? I can even kill all the reduce tasks of job1, but would like to > know whether there is any other better way of achieving this? > > [HOD might be a solution, but we want to avoid splitting the nodes and > would like to utilize all the nodes for both the jobs. We are OK with > first job getting delayed]
The following ideas have all their problems: -) For these cases I usually start job2, and add at least one new node. That's easy for us EC2 users. Slightly more complicated with people that still own their hardware :-P -) Kill the reducers manually. The problem with this is naturally that such a thing will get counted as a failed reducer on job1. Depending upon all your settings, and the probabilities involved, that might increase the probability that job1 will fail in an inacceptable way. Andreas
signature.asc
Description: This is a digitally signed message part.
