Our single-np jobs take 3-6 seconds to start, when there are already jobs running on the worker
nodes (configured as 2-8 nps). Even using qrun -a, it still take close to 2 seconds to start.
Starting jobs on batch of "free" worker nodes is really fast, but most of the time we have some jobs
already running on the worker nodes. Would Moab Asyncstart help in this case? Do the jobs actually
get started, or are they just being pushed to Torque in a higher rate.
Thanks,
...
ling
Josh Butikofer wrote:
First of all, what are the average size of these jobs? Are they single node
jobs, or is there a good mix between parallel and single node jobs? A parallel
job will take a bit longer to start-up due to the sisters needing to be
contacted by the mother superior, etc.
Yeah, Moab's ASYNCSTART option really does help. There are a few other options that
can also give a speed boost. In our best tests, Moab & TORQUE can start 50
jobs/sec. I haven't tried the same benchmark with Maui. I'll look through my
benchmark setup to see if there are more options/tweaks that Maui can take
advantage of.
Josh Butikofer
Cluster Resources, Inc.
#############################
----- "Stijn De Weirdt" <[email protected]> wrote:
hi all,
(this is a crosspost to both maui and torque users list)
we are having issues with the job start rate using maui+torque.
starting
a job takes on average 2 seconds, which is slow for what our users
are
dumping in our queues.
with a job start i mean the following cycle
04/01 10:01:08 MRMJobStart(374900,Msg,SC)
04/01 10:01:08 MPBSJobStart(374900,gengar,Msg,SC)
04/01 10:01:08
MPBSJobModify(374900,Resource_List,Resource,node088.gengar.gent.vsc)
04/01 10:01:10 MPBSJobModify(374900,Resource_List,Resource,1)
04/01 10:01:10 INFO: job '374900' successfully started
04/01 10:01:10 INFO: command sent to server
04/01 10:01:10 INFO: response received from server
i've already tried to follow the "large cluster" tuning tips to see
if
it helps, but no real result. (the only tip that might solve the
problemn is the asyncstart option from moab ;). (we have a 200 node,
8
core/node cluster (i actually don't think this is "large"))
anyway, before i dig in the code looking for options, i'm wondering
what
other people are seeing as minimal start time, so i know if it is
possible at all.
many thanks,
stijn
--
The system will shutdown in 5 minutes.
_______________________________________________
torqueusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers