On 13 feb 2011, at 12:27, Alexander Willner wrote:

> Hi Bas,
> 
>> we just limit the number of job a user can submit in a execution queue, for 
>> example for a 512 node cluster. we have set for the serial queue.
> 
> this might be valid approach. Let me try to summarize:
> 
> * Configuration (with x <= number of nodes)
>  * Execution queues (max_user_queuable = x): queue_1, queue_2, ..., queue_n
>  * Routing queue: queue_default
> * Workflow
>  * User 1 submits y >> x jobs to the default queue
>  * User 2 submits z jobs to the default queue
> * Scheduling
>  * Maui only sees x*n jobs (so the hard limit of about 4096 jobs would be ok)
>  * The user can submit as many jobs as he wants
>  * Torque moves the jobs to the execution queues based on a fair scheduling 
> configuration
> 

Alex,

 The summarization is right.  We use this setup for our clusters and we have 
increased the max job for maui to 32000.  The patches are in our maui_2_deb 
software. I will put the source at our open-source webste in the nearby future. 
Most patches are already applied to the maui source.

regards

> Best regards, Alex
> 
> On 13.02.2011, at 00:20, Bas van der Vlies wrote:
> 
>> Alexander,
>> 
>> On 12 feb 2011, at 13:49, Alexander Willner wrote:
>> 
>>> Hi Roy,
>>> 
>>> thank you for your answer. 
>>> 
>>> On 11.02.2011, at 22:03, Roy Dragseth wrote:
>>>> We have upped the job limit significantly, we currently set the limit to 
>>>> 32000, 
>>>> but you need to recompile maui for this.
>>> 
>>> How exactly have you achieved this? I already pushed the limit to 16384 by 
>>> following:
>>> 
>>> On Friday, February 11, 2011 17:29:39 Alexander Willner wrote:
>>>> (even though I've tested [2])
>>> 
>>> I recompiled the sources, installed them and restarted maui. Still I only 
>>> have short list of queued jobs:
>>> 
>>>> $ qstat|wc -l
>>>> 9482
>>> 
>>>> $ /usr/local/maui/bin/showq|wc -l
>>>> 3773
>>>> $ qstat|tail -n1
>>>> 625162.xxxx  xxxx   xxxx x xxxxx  xxxxx   
>>>> $ runjob 625162
>>>> ERROR:    'runjob' failed
>>>> ERROR:  cannot locate job '625162'
>>> 
>>> 
>>> Best regards, Alex
>>> 
>>> [2] http://www.supercluster.org/pipermail/mauiusers/2007-April/002705.html
>>> 
>>> --
>>> net.cs.bonn.edu/willner
>>> 
>> we just limit the number of job a user can submit in a execution queue, for 
>> example for a 512 node cluster. we have set for the serial queue.
>> {{{
>> create queue q_serial
>> set queue q_serial queue_type = Execution
>> set queue q_serial max_user_queuable = 512
>> set queue q_serial acl_host_enable = False
>> set queue q_serial resources_max.nodect = 1
>> set queue q_serial resources_default.ncpus = 1
>> set queue q_serial resources_default.neednodes = q_serial
>> set queue q_serial resources_default.nodes = 1
>> set queue q_serial enabled = True
>> set queue q_serial started = True
>> }}}
>> 
>> I user can not run more the 512 jobs this is equal to the number of nodes in 
>> the cluster. The other jobs are held in in the routing queue. So every time 
>> a job has finished a job a new job can enter the execution queue.  The jobs 
>> are held by torque so maui does not see all jobs. So this will prevent 
>> floodinf the maui queues.
>> 
>> regards
>> 
>>> <smime.p7s><PGP.sig><ATT00001..txt>
>> 
>> --
>> Bas van der Vlies
>> [email protected]
>> 
>> 
>> 
> 
> --
> net.cs.bonn.edu/willner
> 
> <smime.p7s><PGP.sig><ATT00001..txt>

--
Bas van der Vlies
[email protected]



_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to