Runping Qi wrote:
An improvement over Doug's proposal is to make the limit soft in the
following sense:

1. A job is entitled to run up to the limit number of tasks.
2. If there are free slots and no other job waits for their entitled
slots, a job can run more tasks than the limit.
3. When a job runs more tasks than its limit, and a new job comes, we
may do one of the two:
        a) kill some of the tasks to make room for the new job.
        b) all the running tasks run to complete. Any freed up slot will
be assigned to the new job.

I think this would be a good second phase, as it will be trickier to implement.

Jobs that disable speculative execution may not like having tasks killed (although they must in general still be tolerant of it) so we might only permit jobs with speculative execution enabled to exceed their limit.

Also there should be a delay before a job is permitted to run over its limit, in order to give other jobs an opportunity to launch. For example, if a user is submitting a series of jobs, each consuming the output of the previous, then we wouldn't want an already running job to immediately consume all the free slots when one job completes, since another job will soon be started that is more deserving of these slots. Perhaps, when portions of the cluster are idle, jobs should gradually be permitted to exceed their limit. Then, if new jobs are launched, tasks should only gradually be killed, first giving them the opportunity to finish normally. Some tuning will probably be required to get this right.

Ideally the limit would be dynamic, perhaps something like max(10, #slots/#jobs), so jobs would only be queued when there are fewer than 10 slots/job. But a static limit would still be a significant improvement and easier to implement in the first version.

Doug

Reply via email to