Hi.

We've run into an issue with submitting jobs greater than 4096 on Torque/Maui 
combination. When submitting the following the job runs:

$ qsub -I -lnodes=170:ppn=24

When we go larger by one node:

$ qsub -I -lnodes=171:ppn=24

The job is in the blocked queue with a state of Idle and the following message 
in checkjob:

cannot select job 104 for partition DEFAULT (NodeCount)

I did some searching and found information about number of jobs, but not much 
on number of tasks per job. I tried increasing the MAX_MTASK from the default 
of 4096 to a higher number of 16384 to support our core count on the cluster. 
This works, we're able to submit jobs greater than 4096, but Maui crashes 
within minutes after we're submitting jobs. These are the two parameters we're 
changing before rebuilding Maui:

sed -i '/MMAX_JOB/ s/4096/8192/g' ./include/msched.h 
sed -i '/MAX_MTASK/ s/4096/16384/g' ./include/msched-common.h

MMAX_JOB is one we have on the current build and it doesn't have any adverse 
effect on Maui, it's only when we increase MAX_MTASK.

Is it possible we're missing another parameter change, or possibly a ulimit 
issue? Here's ulimit's on the system where Maui is running:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1196032
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1196032
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Thanks.

Steve
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to