Re: [Mauiusers] Batch Hold for Policy Violation, which doesn't exist

Lennart Karlsson Thu, 19 Jul 2007 01:50:35 -0700

> Lennart,
> Thank you for the input.  I attempted the following with no luck.
> 
> 1) as the queues were set up in maui by the vendor, I set the 
> resources_max.nodect for the server by the command
> set server resources_max.nodect = 140.  restarted PBS on the master.  
> repeat the test and get the same output from checkjob & diagnose 
> (diagnose output included below).
> 
> 2) dug into bugzilla report on bug 99 as you suggested.  i'm not quite 
> sure that this is the exact problem that i'm experiencing as the 
> diagnose reports just that its been put on batch hold rather than 
> violating a maxproc limit.
> 
> I'm including the maui.cfg as well if this can provide some insight to 
> anyone.




> Ok, I just made a big newbie mistake, pardon my repost to correct it.
>
> I finally got qmgr to list me the settings for the queues.  The setting 
> that Lennart suggested was not set.  So I added it and restarted the 
server.  It still reports a a policy violation of 128 > 70.
>
> This is the current setting for the queue low:
> Queue low
>        queue_type = Execution
>        Priority = 10
>        total_jobs = 2
>        state_count = Transit:0 Queued:2 Held:0 Waiting:0 Running:0 Exiting:0
>        max_running = 10
>        resources_max.ncpus = 70
>        resources_max.nodect = 140
>        resources_max.walltime = 96:00:00
>        mtime = Wed Jul 18 11:33:12 2007
>        resources_assigned.ncpus = 0
>        resources_assigned.nodect = 0
>        enabled = True
>        started = True
>
> This is the information from PBS about one of the jobs waiting because 
> of the policy violation:
>    Resource_List.ncpus = 1
>    Resource_List.nodect = 32
>    Resource_List.nodes = 32:ppn=4
>
>What is the difference between .ncpus and .nodect?  And which one does 
>the maui scheduler look at?


Your Torque configuration may be summarized with a snip from the
serverpriv/nodes file, as for example,

n1 np=4
n2 np=4
   .
   .
n35 np=4

together with the output from the qmgr command "print server" (please note
that the "list server" command does not give the full configuration).

>From your "list server" output, it looks like you have set
"resources_max.ncpus = 70". I propose that you remove this setting.
I do not use this resources_max.ncpus restriction myself, but it would not
surprise me if it gets Maui to limit your job size to 70 processors/cores.

I see nothing related to your problems in your Maui configuration.

BTW, I do not think that you need to restart Torque when you make
configuration changes in qmgr. When making changes in the nodes file,
you do need to restart Torque (in a stop-change-start sequence).

Please list your Torque configuration ("print server") and your nodes
file (serverpriv/nodes), if it does not help to remove your
resources_max.ncpus configuration lines.

Best wishes,
-- Lennart Karlsson <[EMAIL PROTECTED]>
   National Supercomputer Centre in Linkoping, Sweden
   http://www.nsc.liu.se


_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Re: [Mauiusers] Batch Hold for Policy Violation, which doesn't exist

Reply via email to