Hi Bas, >> The jobs are held by torque so maui does not see all jobs. So this will >> prevent floodinf the maui queues.
based on my first tests: works as expected. Thank you very much. The result: > user="testuser" && id="testjob" && \ > echo "Maui:" && /usr/local/maui/bin/showq|grep $user|wc -l && \ > echo "Torque total:" && qstat -u $user|grep $id|wc -l && \ > echo "Torque 'default':" && qstat -u $user|grep $id|grep default|wc -l && \ > echo "Torque 'small':" && qstat -u $user|grep $id|grep small_6h|wc -l && \ > echo "Torque running:" && qstat -u $user|grep $id|grep R|wc -l && \ > echo "Torque queued:" && qstat -u $user|grep $id|grep Q|wc -l > Maui: > 500 > Torque total: > 1000 > Torque 'default': > 500 > Torque 'small': > 500 > Torque running: > 354 > Torque queued: > 646 I tested your approach with the following setting: > $ qmgr -c 'p s'|grep "default queue_type" > set queue default queue_type = Route > $ qmgr -c 'p s'|grep "small_6h max_user_queuable" > set queue small_6h max_user_queuable = 500 Then I sent 1000 test jobs: > for i in `seq 1 1000`; do echo "sleep 240" | qsub -l cput=01:00:00 -N > testjob; done Best regards, Alex On 13.02.2011, at 12:27, Alexander Willner wrote: > Hi Bas, > >> we just limit the number of job a user can submit in a execution queue, for >> example for a 512 node cluster. we have set for the serial queue. > > this might be valid approach. Let me try to summarize: > > * Configuration (with x <= number of nodes) > * Execution queues (max_user_queuable = x): queue_1, queue_2, ..., queue_n > * Routing queue: queue_default > * Workflow > * User 1 submits y >> x jobs to the default queue > * User 2 submits z jobs to the default queue > * Scheduling > * Maui only sees x*n jobs (so the hard limit of about 4096 jobs would be ok) > * The user can submit as many jobs as he wants > * Torque moves the jobs to the execution queues based on a fair scheduling > configuration > > Best regards, Alex > > On 13.02.2011, at 00:20, Bas van der Vlies wrote: > >> Alexander, >> >> On 12 feb 2011, at 13:49, Alexander Willner wrote: >> >>> Hi Roy, >>> >>> thank you for your answer. >>> >>> On 11.02.2011, at 22:03, Roy Dragseth wrote: >>>> We have upped the job limit significantly, we currently set the limit to >>>> 32000, >>>> but you need to recompile maui for this. >>> >>> How exactly have you achieved this? I already pushed the limit to 16384 by >>> following: >>> >>> On Friday, February 11, 2011 17:29:39 Alexander Willner wrote: >>>> (even though I've tested [2]) >>> >>> I recompiled the sources, installed them and restarted maui. Still I only >>> have short list of queued jobs: >>> >>>> $ qstat|wc -l >>>> 9482 >>> >>>> $ /usr/local/maui/bin/showq|wc -l >>>> 3773 >>>> $ qstat|tail -n1 >>>> 625162.xxxx xxxx xxxx x xxxxx xxxxx >>>> $ runjob 625162 >>>> ERROR: 'runjob' failed >>>> ERROR: cannot locate job '625162' >>> >>> >>> Best regards, Alex >>> >>> [2] http://www.supercluster.org/pipermail/mauiusers/2007-April/002705.html >>> >>> -- >>> net.cs.bonn.edu/willner >>> >> we just limit the number of job a user can submit in a execution queue, for >> example for a 512 node cluster. we have set for the serial queue. >> {{{ >> create queue q_serial >> set queue q_serial queue_type = Execution >> set queue q_serial max_user_queuable = 512 >> set queue q_serial acl_host_enable = False >> set queue q_serial resources_max.nodect = 1 >> set queue q_serial resources_default.ncpus = 1 >> set queue q_serial resources_default.neednodes = q_serial >> set queue q_serial resources_default.nodes = 1 >> set queue q_serial enabled = True >> set queue q_serial started = True >> }}} >> >> I user can not run more the 512 jobs this is equal to the number of nodes in >> the cluster. The other jobs are held in in the routing queue. So every time >> a job has finished a job a new job can enter the execution queue. The jobs >> are held by torque so maui does not see all jobs. So this will prevent >> floodinf the maui queues. >> >> regards >> >>> <smime.p7s><PGP.sig><ATT00001..txt> >> >> -- >> Bas van der Vlies >> [email protected] >> >> >> > > -- > net.cs.bonn.edu/willner > -- net.cs.bonn.edu/willner
smime.p7s
Description: S/MIME cryptographic signature
PGP.sig
Description: This is a digitally signed message part
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
