>Do you mean why isn't the job running, even though it seems that it *should* 
>be running?

Exactly...

>If so, I would say post the output of qstat -f for the job, and checkjob -v

mahmood@srv1:~$ qstat -f 49153
Job Id: 49153.srv1
    Job_Name = bwaves
    Job_Owner = mahmood@srv1
    job_state = Q
    queue = Long
    server = srv1
    Checkpoint = u
    ctime = Mon Sep 12 19:55:29 2011
    Error_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwave
        s/bwaves.e49153
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Mon Sep 12 19:55:29 2011
    Output_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwav
        es/bwaves_128.out
    Priority = 0
    qtime = Mon Sep 12 19:55:29 2011
    Rerunable = True
    Resource_List.nodect = 1
    Resource_List.nodes = node2
    Resource_List.walltime = 960:00:00
    Variable_List = PBS_O_QUEUE=Long,PBS_O_HOME=/home/mahmood,
        ...
    etime = Mon Sep 12 19:55:29 2011
    submit_args = tor
    fault_tolerant = False

mahmood@srv1:~$ checkjob -v 49153
checking job 49153 (RM job '49153.srv1')

State: Idle
Creds:  user:mahmood  group:mahmood  class:Long  qos:DEFAULT
WallTime: 00:00:00 of 40:00:00:00
SubmitTime: Mon Sep 12 19:55:29
  (Time Queued  Total: 00:39:24  Eligible: 00:39:24)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1
NodeAccess: SHARED
NodeCount: 0


IWD: [NONE]  Executable:  [NONE]
Bypass: 3  StartCount: 0
PartitionMask: [ALL]
Flags:       HOSTLIST RESTARTABLE
HostList:
  [node2:1]
PE:  1.00  StartPriority:  147
job can run in partition DEFAULT (8 procs available.  1 procs required)


>which you seem to have manually selected in your qsub statement

Yes, As you can see I requested node2
Resource_List.nodes = node2

and the output of "pbsnodes -l all" shows that this node is free

mahmood@srv1:~$ pbsnodes -l all
srv1                  job-exclusive
node2                 free
node3                 job-exclusive
node4                 free


Any idea about that?

// Naderan *Mahmood;


----- Original Message -----
From: Steve Crusan <[email protected]>
To: Mahmood Naderan <[email protected]>
Cc: maui <[email protected]>
Sent: Monday, September 12, 2011 6:17 PM
Subject: Re: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Sep 12, 2011, at 5:01 AM, Mahmood Naderan wrote:

> 
> 
> Hi,
> I sent this email to torque mailing list but seems that it is related to 
> maui. So I restate the problem here.
> 
> Can someone explain why the qstat shows a job in "Q" but checkjob says 
> everything is normal?


Looking below, the job is queued in TORQUE, and idle in Maui (not running), so 
everything is normal.

Do you mean why isn't the job running, even though it seems that it *should* be 
running?

If so, I would say post the output of qstat -f for the job, and checkjob -v. 
This seems to be more or less a scheduler configuration, or possibly an issue 
with the node (which you seem to have manually selected in your qsub statement).



> 
> mahmood@srv1:416.gamess$ qstat 49003
> Job id                    Name             User            Time Use S Queue
> ------------------------- ---------------- --------------- -------- - -----
> 49003.srv1                 gamess           mahmood                0 Q Long
> 
> 
> mahmood@srv1:416.gamess$ checkjob 49003
> checking job 49003
> 
> State: Idle
> Creds:  user:mahmood  group:mahmood  class:Long    qos:DEFAULT
> WallTime: 00:00:00 of 40:00:00:00
> SubmitTime: Sun Sep 11 09:51:26
>   (Time Queued  Total: 00:02:36  Eligible: 00:02:36)
> 
> Total Tasks: 1
> 
> Req[0]  TaskCount: 1  Partition: ALL
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> 
> 
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 0  StartCount: 0
> PartitionMask: [ALL]
> Flags:       HOSTLIST RESTARTABLE
> HostList:
>   [hawk:1]
> PE:  1.00  StartPriority:  129
> job can run in partition DEFAULT (3 procs available.  1 procs required)
> 
> Thanks
> // Naderan *Mahmood;
> 
> _______________________________________________
> mauiusers mailing list
> [email protected]
> http://www.supercluster.org/mailman/listinfo/mauiusers

----------------------
Steve Crusan
System Administrator
Center for Research Computing
University of Rochester
https://www.crc.rochester.edu/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJObg2IAAoJENS19LGOpgqKAnIIAKHvbLmV9Hs31IZ4AGHIOFG9
Wxp+qiXOnIMoKQQjhkkou1zVC4OKHnymcE/LxtiQcAuX+Lu8gd/GAR1tF5FeCF4g
m7go12yb5Dx97sHgl2SjmRY3duDkx6YMfOGgxCuiN+O5SdkUazuW8GPkW+HPPS7/
T3gDbG0jizZ6A5LzhJqgPyVC4LKkwYt5v9NQBs/f82ZOGqPusEWdJ4N5oaUYhyG/
OXSj/xmzMTCYCqfdOUZynq4ACQotRbNmY7wrV+Uc0qWUFtZv/RIwQ/O4P261E/1/
dfrVX3OEdz9FBy4uoNrgMyNxL2eOanNiKSlhHJnoM04zx0SkAYGDOeGPqYv/vi0=
=QcC7
-----END PGP SIGNATURE-----

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to