Re: Job scheduling (Re: Unable to run more than one job concurrently)

Eric Baldeschwieler Fri, 19 May 2006 20:43:34 -0700

Yes, please keep it simple.

I know folks in research with many ideas for how to add features tothe programming model and we still have a lot of work to do to makeone job run efficiently. I think it is best to not make thesedirections harder by adding many features that will need to besupported to the scheduler. Let's master running one job well beforeputting a lot of effort into running lots of jobs.

That said, priority queuing seems simple and separable from myconcerns. Interleaving jobs gets more complicated and is somethingI'd be happy to see deferred. Doing so by default with decayingpriorities and such doesn't seem like the right trade-off to me.This is primarily a batch system.


On May 19, 2006, at 2:09 PM, Paul Sutter wrote:

right now, i would prefer that we have a really really simplescheme that
can be done quickly.

being able to run a huge days-long job in the background, such that a
smaller hours-long job basically takes over and runs to completion,that
would be a big win.

having multple jobs with time slicing? cool, but only 20% better and
probably 80% more work.
if we want a thorough scheme, i think eric has the right idea ofusing one
of the existing scheduler packages.


On 5/19/06, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Doug Cutting wrote:
> Paul Sutter wrote:
>> (1) Allow submission times in the future, enabling the creation of
>> "background" jobs. My understanding is that job submissiontimes are
>> used to
>> prioritize scheduling. All tasks from a job submitted early run to
>> completion before those of a job submitted later. If we couldsubmit
any
>> days-long jobs with a submission time in the future, say the year
>> 2010, and
>> any short hours-long jobs with the current time, that short jobwould
be
>> able to interrupt the long job. Hack? Yes. Useful? I think so.
>
> I think this is equivalent to adding a job priority, where taskswith
> the highest priority job are run first.  If jobs are at the same
> priority, then the first submitted would run. Adding prioritywould
> add a bit more complexity, but would also be less of a hack.


Hmm.. If you compare it to a Unix scheduler, processes at the same
priority have even chances of being run, regardless of which wasstartedfirst - not only that, processes undergo a "priority decay", inthat if
they are running longer then their priority is lowered - this enables
new processes to start quickly (and maybe quickly finish), and then
fairly compete with other processes.

In our case, this would mean that jobs with the same priority would
execute concurrently, sharing available map/reduce slots, and long
running jobs would be gradually de-prioritized. This also meansthat thefirst job will slow down when the second one is started, but thesecondjob will have a chance to make a good start (and perhaps quicklyfinish)and then, subject to the priority decay, run in parallel withother jobs
(albeit slower) instead of being stuck in the wait queue.

And if the second job is started with a higher priority, it should
preempt the first job (i.e. it should get proportionally moreslots than
the first job). If you need all cluster resources for a specific job,
and don't want any other jobs to run, just set the priority to the
highest value, thus preempting all other jobs (actually, it would
suspend other already executing jobs, which would resume when yourjob
is done - not a bad feature either!).

I think this is a relatively simple and well understood mechanism.
>> (2) Have a per-job total task count limit. Currently, weestablish the>> number of tasks each node runs, and how many map or reducetasks we
have
>> total in a given job. But it would be great if we could set aceiling
>> on the
>> number of tasks that run concurrently for a given job. This mayhelp
>> with
>> Andrzej's fetcher (since it is bandwidth constrained, maybe fewer
>> concurrent
>> jobs would be fine?).
>
> I like this idea. So if the highest-priority job is alreadyrunning
> at its task limit, then tasks can be run from the next
> highest-priority job.  Should there be separate limits for maps and
> reduces?
I like this idea too. I think a similar setting for the minimumnumberof tasks would be needed too? That would solve my problem. Infact, itwould be probably better than the schema I described above,because it
would guarantee certain minimum tasks running at any time.

This reminds me of the "idle time" and "real time" policies in BSD
scheduler ... man 1 rtprio. The "real time" policy prevents theprioritydecay that normally occurs, and "idle time" policy allowsprocesses to
run only if CPU is idle.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Job scheduling (Re: Unable to run more than one job concurrently)

Reply via email to