Hi Bas,

>> The jobs are held by torque so maui does not see all jobs. So this will 
>> prevent floodinf the maui queues.


based on my first tests: works as expected. Thank you very much. The result:

> user="testuser" && id="testjob" && \
> echo "Maui:" && /usr/local/maui/bin/showq|grep $user|wc -l && \
> echo "Torque total:" && qstat -u $user|grep $id|wc -l && \
> echo "Torque 'default':" && qstat -u $user|grep $id|grep default|wc -l && \
> echo "Torque 'small':" && qstat -u $user|grep $id|grep small_6h|wc -l && \
> echo "Torque running:" && qstat -u $user|grep $id|grep R|wc -l && \
> echo "Torque queued:" && qstat -u $user|grep $id|grep Q|wc -l 

> Maui:
> 500
> Torque total:
> 1000
> Torque 'default':
> 500
> Torque 'small':
> 500
> Torque running:
> 354
> Torque queued:
> 646

I tested your approach with the following setting:

> $ qmgr -c 'p s'|grep "default queue_type"
> set queue default queue_type = Route
> $ qmgr -c 'p s'|grep "small_6h max_user_queuable"
> set queue small_6h max_user_queuable = 500

Then I sent 1000 test jobs:

> for i in `seq 1 1000`; do echo "sleep 240" | qsub -l cput=01:00:00 -N 
> testjob; done


Best regards, Alex

On 13.02.2011, at 12:27, Alexander Willner wrote:

> Hi Bas,
> 
>> we just limit the number of job a user can submit in a execution queue, for 
>> example for a 512 node cluster. we have set for the serial queue.
> 
> this might be valid approach. Let me try to summarize:
> 
> * Configuration (with x <= number of nodes)
>  * Execution queues (max_user_queuable = x): queue_1, queue_2, ..., queue_n
>  * Routing queue: queue_default
> * Workflow
>  * User 1 submits y >> x jobs to the default queue
>  * User 2 submits z jobs to the default queue
> * Scheduling
>  * Maui only sees x*n jobs (so the hard limit of about 4096 jobs would be ok)
>  * The user can submit as many jobs as he wants
>  * Torque moves the jobs to the execution queues based on a fair scheduling 
> configuration
> 
> Best regards, Alex
> 
> On 13.02.2011, at 00:20, Bas van der Vlies wrote:
> 
>> Alexander,
>> 
>> On 12 feb 2011, at 13:49, Alexander Willner wrote:
>> 
>>> Hi Roy,
>>> 
>>> thank you for your answer. 
>>> 
>>> On 11.02.2011, at 22:03, Roy Dragseth wrote:
>>>> We have upped the job limit significantly, we currently set the limit to 
>>>> 32000, 
>>>> but you need to recompile maui for this.
>>> 
>>> How exactly have you achieved this? I already pushed the limit to 16384 by 
>>> following:
>>> 
>>> On Friday, February 11, 2011 17:29:39 Alexander Willner wrote:
>>>> (even though I've tested [2])
>>> 
>>> I recompiled the sources, installed them and restarted maui. Still I only 
>>> have short list of queued jobs:
>>> 
>>>> $ qstat|wc -l
>>>> 9482
>>> 
>>>> $ /usr/local/maui/bin/showq|wc -l
>>>> 3773
>>>> $ qstat|tail -n1
>>>> 625162.xxxx  xxxx   xxxx x xxxxx  xxxxx   
>>>> $ runjob 625162
>>>> ERROR:    'runjob' failed
>>>> ERROR:  cannot locate job '625162'
>>> 
>>> 
>>> Best regards, Alex
>>> 
>>> [2] http://www.supercluster.org/pipermail/mauiusers/2007-April/002705.html
>>> 
>>> --
>>> net.cs.bonn.edu/willner
>>> 
>> we just limit the number of job a user can submit in a execution queue, for 
>> example for a 512 node cluster. we have set for the serial queue.
>> {{{
>> create queue q_serial
>> set queue q_serial queue_type = Execution
>> set queue q_serial max_user_queuable = 512
>> set queue q_serial acl_host_enable = False
>> set queue q_serial resources_max.nodect = 1
>> set queue q_serial resources_default.ncpus = 1
>> set queue q_serial resources_default.neednodes = q_serial
>> set queue q_serial resources_default.nodes = 1
>> set queue q_serial enabled = True
>> set queue q_serial started = True
>> }}}
>> 
>> I user can not run more the 512 jobs this is equal to the number of nodes in 
>> the cluster. The other jobs are held in in the routing queue. So every time 
>> a job has finished a job a new job can enter the execution queue.  The jobs 
>> are held by torque so maui does not see all jobs. So this will prevent 
>> floodinf the maui queues.
>> 
>> regards
>> 
>>> <smime.p7s><PGP.sig><ATT00001..txt>
>> 
>> --
>> Bas van der Vlies
>> [email protected]
>> 
>> 
>> 
> 
> --
> net.cs.bonn.edu/willner
> 

--
net.cs.bonn.edu/willner

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Attachment: PGP.sig
Description: This is a digitally signed message part

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to