Re: Task type priorities during scheduling ?

Doug Cutting Tue, 25 Jul 2006 23:17:47 -0700

Paul Sutter wrote:

First, It matters in the case of concurrent jobs. If you submit a 20
minute job while a 20 hour job is running, it would be nice if the
reducers for the 20 minute job could get a chance to run before the 20
hour job's mappers have all finished. So even without a throughput
improvement, you have an important capability (although it may require
another minor tweak or two to make possible).

I fear that more than a minor tweak or two are required to makeconcurrent jobs work well. For example, you would also want to makesure that the long-running job does not consume all of the reduce slots,or the short job would again get stuck behind it. Pausing long-runningtasks might be required.

The best way to do this at present is to run two job trackers, and twotasktrackers per node, then submit long-runnning jobs to one "cluster"and short-running jobs to the other.

Secondarily, we often have stragglers, where one mapper runs slower
than the others. When this happens, we end up with a largely idle
cluster for as long as an hour. In cases like these, good support for
concurrent jobs _would_ improve throughput.

Can you perhaps increase the number of map tasks, so that even a slowtask takes only a very small portion of the total execution time?

Good support for concurrent jobs would be great to have, and I'd love tosee a patch that addresses this issue comprehensively. I am notconvinced that it is worth making minor tweaks that may-or-may-notreally help us to get there.


Doug

Re: Task type priorities during scheduling ?

Reply via email to