> unset the ncpus qmgr settings and don't request ncpus. Just use the ppn
> syntax.
Maui still kills the job with the same error:
12/29 10:05:49 INFO: job 850440 exceeds requested proc limit (15.60 > 1.00)
12/29 10:05:49 MSysRegEvent(JOBRESVIOLATION: job '850440' in state
'Running' has exceeded PROC resource limit (1560 > 100) (action CANCEL
will be taken) job start time: Tue Dec 29 10:05:18
Here's the queue definition:
queue_type = Execution
total_jobs = 1
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0
max_running = 400
resources_max.nodect = 12
resources_max.nodes = 12:ppn=16
resources_min.nodect = 1
resources_min.nodes = 1:ppn=1
resources_default.neednodes = pir
resources_default.nodes = 1:ppn=1
resources_default.walltime = 01:00:00
acl_group_enable = True
acl_groups = lab4
acl_group_sloppy = True
mtime = Tue Dec 29 10:02:41 2009
enabled = True
started = True
Here's the pbs file:
#!/bin/tcsh
#PBS -M [email protected]
#PBS -m bae
#PBS -l nodes=1:ppn=16
#PBS -l mem=2000mb
#PBS -l walltime=00:10:00
#PBS -o nqpbs.out
#PBS -j oe
cd ~/nqueens/pthreads
time ./nq-gcc -q -n 20 -t 16
Here's the output of qstat -f for one of the jobs:
Job Id: 850440.piranha
Job_Name = nq2.pbs
Job_Owner = sab...@piranha
resources_used.cput = 00:02:36
resources_used.mem = 3636kb
resources_used.vmem = 161844kb
resources_used.walltime = 00:00:10
job_state = R
queue = pir
server = piranha
Checkpoint = u
ctime = Tue Dec 29 10:05:17 2009
Error_Path = piranha:/home/sabujp/nqueens/pthreads/nq2.pbs.e850440
exec_host = pir25/15+pir25/14+pir25/13+pir25/12+pir25/11+pir25/10+pir25/9+
pir25/8+pir25/7+pir25/6+pir25/5+pir25/4+pir25/3+pir25/2+pir25/1+pir25/
0
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = bae
Mail_Users = [email protected]
mtime = Tue Dec 29 10:05:18 2009
Output_Path = piranha:/home/sabujp/nqueens/pthreads/nqpbs.out
Priority = 0
qtime = Tue Dec 29 10:05:17 2009
Rerunable = True
Resource_List.mem = 2000mb
Resource_List.nodect = 1
Resource_List.nodes = 1:ppn=16
Resource_List.walltime = 00:10:00
session_id = 27354
Variable_List = PBS_O_HOME=/home/sabujp,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=sabujp,
PBS_O_PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/sbin:/usr/sbi
n:/sb/apps/noarch/bin:/usr/kerberos/bin:/sb/apps/Linux/bin:.:/sb/apps/
Linux/opt/bin:/home/sabujp/bin,PBS_O_MAIL=/var/spool/mail/sabujp,
PBS_O_SHELL=/bin/tcsh,PBS_O_HOST=piranha,PBS_SERVER=piranha,
PBS_O_WORKDIR=/home/sabujp/nqueens/pthreads,PBS_O_QUEUE=pir
etime = Tue Dec 29 10:05:17 2009
submit_args = nq2.pbs
start_time = Tue Dec 29 10:05:18 2009
start_count = 1
fault_tolerant = False
As someone else mentioned that it might be:
> NODECFG[pir23] PROCSPEED=2930 SPEED=1.00 <----
Although highly unlikely that this is causing the error, I did set
SPEED=20.00 but the job was still killed for the same reason.
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers