Re: [gt-user] Limits when submitting jobs to pbs through globus

Ioan Raicu Wed, 28 May 2008 10:45:29 -0700

Hi,

Jan Ploski wrote:

[EMAIL PROTECTED] schrieb am 05/28/2008 05:43:08 AM:
On May 27, 2008, at 12:20 AM, Yuriy wrote:
We have 10 node cluster with 2
quad-core processors per node, and when number of jobs is greater then
160
Why are you treating globus job submissions like an extended batchqueue?
...
Anyone want to chime in?
I would say that treating Globus job submissions like an extended batchqueue should be among the allowable use cases. Users of a local batchscheduler may view the "Grid" as a drop-in replacement, which they expectto be at least as easy to use and efficient, just more fault-tolerant andscalable. The fewer gotchas, incompatibilities, weird issues, andtechnical workarounds they must care about, the better. It's hard enoughto convince them to abandon their familiar client software. So if Globusconsumes system resources even for idle jobs, then it seems to me like adesign or implementation flaw in Globus, not a user's misunderstanding. (Ibelieve it is no longer so bad in GT 4 as it used to be in the earlierversions.

GRAM2 was notoriously bad at consuming lots of resources even for idlejobs. GRAM4 is much better at handling many concurrent submissions, andhaving jobs queued up in an idle state until processors becomeavailable. Part of the problem is that GRAM and LRMs are looselycoupled, and interact through log files, ssh sessions, etc... ifstandard interfaces (i.e. WS) were defined by each LRM (e.g. Condor,PBS, SGE, etc), then GRAM could also be more efficient at interactingwith them, but they do not, and hence there is only so much theimplementation can do given the interfaces it currently has. Also,production LRMs also have scalability problems of their own, where theirperformance degrades significantly when their queues grow, or whenstatus information is queried too often. The LRMs are improving, butmany production Grids are still running older instances of the LRMs,which had performance and scalability issues under high load.

Your remarks about the need for client-side submissionthrottling are of course correct.)

It is important to know the limitations of the resource managementinfrastructure, and use throttling to ensure that you stay in the safemargins of performance.


Cheers,
Ioan

Regards,
Jan Ploski


--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: [EMAIL PROTECTED]
Web:   http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================

Re: [gt-user] Limits when submitting jobs to pbs through globus

Reply via email to