On May 27, 2008, at 12:20 AM, Yuriy wrote:
We have 10 node cluster with 2
quad-core processors per node, and when number of jobs is greater then
160
Why are you treating globus job submissions like an extended batch
queue?
There are a number of things you can do to cut down on the number of
open connections that are tied up just waiting for batch slots to
open on your resource. Note you have overloaded the total number of
Globus-managed jobs compared to your simultaneous processing
capabilities by two-to-one, in tis case. This does not make any
sense -- half of your connections are simply waiting for resources to
become available.
Here are some options:
(1) Use an external scheduler such as Condor-G to throttle the number
of submissions to your remote resource to a reasonable value.
(2) Alternatively, use a pilot-job or glide-in job submission
scenario to send a single job to the remote grid resource and
interact with it locally to handle your local submissions
(3) Do a combination of the above (either or both) in conjunction
with an adaptive workflow management tool (e.g. Pegasus, Kepler,
Gridway, etc.). The resulting combination can be tuned to adapt to
changes in availability among multiple remote resources, allowing you
to move jobs to ehere resources are available.
This is one of the most often-made mistakes from my point of view in
handling multiple grid jobs - to expect that multiple grid job
submissions will act like they would submitting to a simple local
queue. They key point to understand, I believe, is that a grid job
is "live" while it is communicating with your resource, and to use
one of the above strategies to minimize the number of live connections.
P.S.: I have encouraged the Globus team privately to consider
integrating pilot-job or glide-in capabilities more closely into the
core software to provide the user with easier hooks for this, and to
minimize the need for users to reinvent this type of workflow
control. There could be other ideas out there to handle this also.
Anyone eant to chime in?
Hope this helps
Alan Sill, Ph.D
TIGRE Senior Scientist, High Performance Computing Center
Adjunct Professor of Physics
TTU
====================================================================
: Alan Sill, Texas Tech University Office: Admin 233, MS 4-1167 :
: e-mail: [EMAIL PROTECTED] ph. 806-742-4350 fax 806-742-4358 :
====================================================================