Aaron Kimball wrote:
Multiple students should be able to submit jobs and if one student's poorly-written task is grinding up a lot of cycles on a shared cluster, other students still need to be able to test their code in the meantime;

I think a simple approach to address this is to limit the number of tasks from a job that are permitted to execute simultaneously. If, for example, you have a cluster of 50 dual-core nodes, with 100 map task slots and 100 reduce task slots, and the configured limit is 25 simultaneous tasks/job, then four or more jobs will be able to run at a time. This will permit faster jobs to pass slower jobs. This approach also avoids some problems we've seen with HOD, where nodes are underutilized during the tail of jobs, and with input locality.

The JobTracker already handles simultaneously executing jobs, so the primary change required is just to task allocation, and thus should not prove intractable.

I've added a Jira issue for this:


Please add further comments there.


Reply via email to