Re: [Mauiusers] MSysRegEvent(JOBRESVIOLATION: job '850416' in state 'Running' has exceeded PROC resource limit (1618 > 100) (action CANCEL will be taken)

Sabuj Pattanayek Tue, 29 Dec 2009 08:18:22 -0800

> unset the ncpus qmgr settings and don't request ncpus.  Just use the ppn 
> syntax.


Maui still kills the job with the same error:

12/29 10:05:49 INFO:     job 850440 exceeds requested proc limit (15.60 > 1.00)
12/29 10:05:49 MSysRegEvent(JOBRESVIOLATION:  job '850440' in state
'Running' has exceeded PROC resource limit (1560 > 100) (action CANCEL
will be taken)  job start time: Tue Dec 29 10:05:18

Here's the queue definition:

queue_type = Execution
        total_jobs = 1
        state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0
        max_running = 400
        resources_max.nodect = 12
        resources_max.nodes = 12:ppn=16
        resources_min.nodect = 1
        resources_min.nodes = 1:ppn=1
        resources_default.neednodes = pir
        resources_default.nodes = 1:ppn=1
        resources_default.walltime = 01:00:00
        acl_group_enable = True
        acl_groups = lab4
        acl_group_sloppy = True
        mtime = Tue Dec 29 10:02:41 2009
        enabled = True
        started = True

Here's the pbs file:

#!/bin/tcsh
#PBS -M [email protected]
#PBS -m bae
#PBS -l nodes=1:ppn=16
#PBS -l mem=2000mb
#PBS -l walltime=00:10:00
#PBS -o nqpbs.out
#PBS -j oe

cd ~/nqueens/pthreads
time ./nq-gcc -q -n 20 -t 16

Here's the output of qstat -f for one of the jobs:

Job Id: 850440.piranha
    Job_Name = nq2.pbs
    Job_Owner = sab...@piranha
    resources_used.cput = 00:02:36
    resources_used.mem = 3636kb
    resources_used.vmem = 161844kb
    resources_used.walltime = 00:00:10
    job_state = R
    queue = pir
    server = piranha
    Checkpoint = u
    ctime = Tue Dec 29 10:05:17 2009
    Error_Path = piranha:/home/sabujp/nqueens/pthreads/nq2.pbs.e850440
    exec_host = pir25/15+pir25/14+pir25/13+pir25/12+pir25/11+pir25/10+pir25/9+
        pir25/8+pir25/7+pir25/6+pir25/5+pir25/4+pir25/3+pir25/2+pir25/1+pir25/
        0
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = bae
    Mail_Users = [email protected]
    mtime = Tue Dec 29 10:05:18 2009
    Output_Path = piranha:/home/sabujp/nqueens/pthreads/nqpbs.out
    Priority = 0
    qtime = Tue Dec 29 10:05:17 2009
    Rerunable = True
    Resource_List.mem = 2000mb
    Resource_List.nodect = 1
    Resource_List.nodes = 1:ppn=16
    Resource_List.walltime = 00:10:00
    session_id = 27354
    Variable_List = PBS_O_HOME=/home/sabujp,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=sabujp,
        PBS_O_PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/sbin:/usr/sbi
        n:/sb/apps/noarch/bin:/usr/kerberos/bin:/sb/apps/Linux/bin:.:/sb/apps/
        Linux/opt/bin:/home/sabujp/bin,PBS_O_MAIL=/var/spool/mail/sabujp,
        PBS_O_SHELL=/bin/tcsh,PBS_O_HOST=piranha,PBS_SERVER=piranha,
        PBS_O_WORKDIR=/home/sabujp/nqueens/pthreads,PBS_O_QUEUE=pir
    etime = Tue Dec 29 10:05:17 2009
    submit_args = nq2.pbs
    start_time = Tue Dec 29 10:05:18 2009
    start_count = 1
    fault_tolerant = False

As someone else mentioned that it might be:

> NODECFG[pir23] PROCSPEED=2930 SPEED=1.00 <----

Although highly unlikely that this is causing the error, I did set
SPEED=20.00 but the job was still killed for the same reason.
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Re: [Mauiusers] MSysRegEvent(JOBRESVIOLATION: job '850416' in state 'Running' has exceeded PROC resource limit (1618 > 100) (action CANCEL will be taken)

Reply via email to