Re: Question on running simultaneous jobs

Doug Cutting Thu, 10 Jan 2008 11:21:42 -0800

Runping Qi wrote:

An improvement over Doug's proposal is to make the limit soft in the
following sense:


1. A job is entitled to run up to the limit number of tasks.
2. If there are free slots and no other job waits for their entitled
slots, a job can run more tasks than the limit.
3. When a job runs more tasks than its limit, and a new job comes, we
may do one of the two:
        a) kill some of the tasks to make room for the new job.
        b) all the running tasks run to complete. Any freed up slot will
be assigned to the new job.

I think this would be a good second phase, as it will be trickier toimplement.

Jobs that disable speculative execution may not like having tasks killed(although they must in general still be tolerant of it) so we might onlypermit jobs with speculative execution enabled to exceed their limit.

Also there should be a delay before a job is permitted to run over itslimit, in order to give other jobs an opportunity to launch. Forexample, if a user is submitting a series of jobs, each consuming theoutput of the previous, then we wouldn't want an already running job toimmediately consume all the free slots when one job completes, sinceanother job will soon be started that is more deserving of these slots.Perhaps, when portions of the cluster are idle, jobs should graduallybe permitted to exceed their limit. Then, if new jobs are launched,tasks should only gradually be killed, first giving them the opportunityto finish normally. Some tuning will probably be required to get thisright.

Ideally the limit would be dynamic, perhaps something like max(10,#slots/#jobs), so jobs would only be queued when there are fewer than 10slots/job. But a static limit would still be a significant improvementand easier to implement in the first version.


Doug

Re: Question on running simultaneous jobs

Reply via email to