Hi All,

Apologies for cross posting, but I wasn't sure where it would fit best. I have a problem that a job is stuck in the queued state in torque, i.e. qstat shows the state as queued, but maui says the job is active, i.e. showq lists it as active and running. In showq the job runtime is not counting down and the job is definitely not running on any of the nodes it is supposed to. Diagnose -j says:

Name State Par Proc QOS WCLimit R Min User Group Account QueuedTime Network Opsys Arch Mem Disk Procs Class Features

21 Running DEF 144 DEF 00:02:00 1 144 mcdiypp2 nmrc - 00:28:33 [NONE] [NONE] [NONE] >=0 >=0 NC0 [short_2h:1] [NONE]

And qstat -f says

Job Id: 21.steel.mib.man.ac.uk
    Job_Name = qsubtest.com
    Job_Owner = [EMAIL PROTECTED]
    job_state = Q
    queue = short_2h
    server = steel.mib.man.ac.uk
    Checkpoint = u
    ctime = Wed Sep 24 14:08:27 2008
    Error_Path = steel.mib.man.ac.uk:/home/mcdiypp2/qsubtest.com.e21
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Wed Sep 24 14:37:54 2008
    Output_Path = steel.mib.man.ac.uk:/home/mcdiypp2/qsubtest.com.o21
    Priority = 0
    qtime = Wed Sep 24 14:08:27 2008
    Rerunable = True
    Resource_List.nodect = 18
    Resource_List.nodes = 18:ppn=8
    Resource_List.walltime = 00:02:00
    Variable_List = PBS_O_HOME=/home/mcdiypp2,PBS_O_LANG=en_GB.UTF-8,
        PBS_O_LOGNAME=mcdiypp2,
        PBS_O_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/b
        in:/usr/games:/opt/torque/2.3.3/bin:/opt/torque/2.3.3/sbin:/opt/maui/3
        .2.6p19/bin:/opt/maui/3.2.6p19/sbin:/opt/openmpi/1.2.6/bin,
        PBS_O_MAIL=/var/mail/mcdiypp2,PBS_O_SHELL=/bin/bash,
        PBS_SERVER=steel.mib.man.ac.uk,PBS_O_HOST=steel.mib.man.ac.uk,
        PBS_O_WORKDIR=/home/mcdiypp2,PBS_O_QUEUE=route
    etime = Wed Sep 24 14:08:27 2008
    exit_status = -3
    submit_args = qsubtest.com
    start_time = Wed Sep 24 14:08:28 2008
    start_count = 1756


I don't understand the priority being zero, as maui lists the startpriority as 60. Something appears to be not communicating somewhere. Could someone shed some light on it?

Philip Peartree

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to