-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sep 12, 2011, at 12:27 PM, Mahmood Naderan wrote:
>> Do you mean why isn't the job running, even though it seems that it *should* >> be running? > > Exactly... > >> If so, I would say post the output of qstat -f for the job, and checkjob -v > > mahmood@srv1:~$ qstat -f 49153 > Job Id: 49153.srv1 > Job_Name = bwaves > Job_Owner = mahmood@srv1 > job_state = Q > queue = Long > server = srv1 > Checkpoint = u > ctime = Mon Sep 12 19:55:29 2011 > Error_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwave > s/bwaves.e49153 > Hold_Types = n > Join_Path = oe > Keep_Files = n > Mail_Points = a > mtime = Mon Sep 12 19:55:29 2011 > Output_Path = srv1:/home/mahmood/multi2sim-3.0.3/410.bwav > es/bwaves_128.out > Priority = 0 > qtime = Mon Sep 12 19:55:29 2011 > Rerunable = True > Resource_List.nodect = 1 > Resource_List.nodes = node2 > Resource_List.walltime = 960:00:00 > Variable_List = PBS_O_QUEUE=Long,PBS_O_HOME=/home/mahmood, > ... > etime = Mon Sep 12 19:55:29 2011 > submit_args = tor > fault_tolerant = False > > mahmood@srv1:~$ checkjob -v 49153 > checking job 49153 (RM job '49153.srv1') > > State: Idle > Creds: user:mahmood group:mahmood class:Long qos:DEFAULT > WallTime: 00:00:00 of 40:00:00:00 > SubmitTime: Mon Sep 12 19:55:29 > (Time Queued Total: 00:39:24 Eligible: 00:39:24) > > Total Tasks: 1 > > Req[0] TaskCount: 1 Partition: ALL > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > Opsys: [NONE] Arch: [NONE] Features: [NONE] > Exec: '' ExecSize: 0 ImageSize: 0 > Dedicated Resources Per Task: PROCS: 1 > NodeAccess: SHARED > NodeCount: 0 > > > IWD: [NONE] Executable: [NONE] > Bypass: 3 StartCount: 0 > PartitionMask: [ALL] > Flags: HOSTLIST RESTARTABLE > HostList: > [node2:1] > PE: 1.00 StartPriority: 147 > job can run in partition DEFAULT (8 procs available. 1 procs required) There has got to be a reason why the job won't start even resources are available. I was hoping that checkjob -v would show the node information, but maybe it's different for maui. Can you run a checkjob -v -n <nodeid> <jobid> The specific node itself seems to be having problems, or maui is not starting it. Do you see anything relevant in your /var/spool/maui/logs/maui.log file? If not, I would increase the verbosity of the logging, and restart the maui service. > > >> which you seem to have manually selected in your qsub statement > > Yes, As you can see I requested node2 > Resource_List.nodes = node2 > > and the output of "pbsnodes -l all" shows that this node is free > > mahmood@srv1:~$ pbsnodes -l all > srv1 job-exclusive > node2 free > node3 job-exclusive > node4 free > > > Any idea about that? > > // Naderan *Mahmood; > > > ----- Original Message ----- > From: Steve Crusan <[email protected]> > To: Mahmood Naderan <[email protected]> > Cc: maui <[email protected]> > Sent: Monday, September 12, 2011 6:17 PM > Subject: Re: [Mauiusers] Job is in 'Q' but checkjob shows it is running (!) > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > On Sep 12, 2011, at 5:01 AM, Mahmood Naderan wrote: > >> >> >> Hi, >> I sent this email to torque mailing list but seems that it is related to >> maui. So I restate the problem here. >> >> Can someone explain why the qstat shows a job in "Q" but checkjob says >> everything is normal? > > > Looking below, the job is queued in TORQUE, and idle in Maui (not running), > so everything is normal. > > Do you mean why isn't the job running, even though it seems that it *should* > be running? > > If so, I would say post the output of qstat -f for the job, and checkjob -v. > This seems to be more or less a scheduler configuration, or possibly an issue > with the node (which you seem to have manually selected in your qsub > statement). > > > >> >> mahmood@srv1:416.gamess$ qstat 49003 >> Job id Name User Time Use S Queue >> ------------------------- ---------------- --------------- -------- - ----- >> 49003.srv1 gamess mahmood 0 Q Long >> >> >> mahmood@srv1:416.gamess$ checkjob 49003 >> checking job 49003 >> >> State: Idle >> Creds: user:mahmood group:mahmood class:Long qos:DEFAULT >> WallTime: 00:00:00 of 40:00:00:00 >> SubmitTime: Sun Sep 11 09:51:26 >> (Time Queued Total: 00:02:36 Eligible: 00:02:36) >> >> Total Tasks: 1 >> >> Req[0] TaskCount: 1 Partition: ALL >> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 >> Opsys: [NONE] Arch: [NONE] Features: [NONE] >> >> >> IWD: [NONE] Executable: [NONE] >> Bypass: 0 StartCount: 0 >> PartitionMask: [ALL] >> Flags: HOSTLIST RESTARTABLE >> HostList: >> [hawk:1] >> PE: 1.00 StartPriority: 129 >> job can run in partition DEFAULT (3 procs available. 1 procs required) >> >> Thanks >> // Naderan *Mahmood; >> >> _______________________________________________ >> mauiusers mailing list >> [email protected] >> http://www.supercluster.org/mailman/listinfo/mauiusers > > ---------------------- > Steve Crusan > System Administrator > Center for Research Computing > University of Rochester > https://www.crc.rochester.edu/ > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.17 (Darwin) > Comment: GPGTools - http://gpgtools.org > > iQEcBAEBAgAGBQJObg2IAAoJENS19LGOpgqKAnIIAKHvbLmV9Hs31IZ4AGHIOFG9 > Wxp+qiXOnIMoKQQjhkkou1zVC4OKHnymcE/LxtiQcAuX+Lu8gd/GAR1tF5FeCF4g > m7go12yb5Dx97sHgl2SjmRY3duDkx6YMfOGgxCuiN+O5SdkUazuW8GPkW+HPPS7/ > T3gDbG0jizZ6A5LzhJqgPyVC4LKkwYt5v9NQBs/f82ZOGqPusEWdJ4N5oaUYhyG/ > OXSj/xmzMTCYCqfdOUZynq4ACQotRbNmY7wrV+Uc0qWUFtZv/RIwQ/O4P261E/1/ > dfrVX3OEdz9FBy4uoNrgMyNxL2eOanNiKSlhHJnoM04zx0SkAYGDOeGPqYv/vi0= > =QcC7 > -----END PGP SIGNATURE----- > ---------------------- Steve Crusan System Administrator Center for Research Computing University of Rochester https://www.crc.rochester.edu/ -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJObkjnAAoJENS19LGOpgqKwwQH/26RwQZX1BG/M3V/PztkOpPs CwshkSuBkQGrNqshY6/BenrZpXHGgEYGbqYyFm29NWMyNQ1Vm33mfb0rq84DBkXk gbME5qwg3uKeATUGuBQoMxdy/JEu1TdqDx4FNwLh8/wLxzhmJcQqatEX4qvEgJWP oT3m0j29rgENLfVKpZ40P7vHAPafJrnTAQjPsqmoZLnkK0dGOD/zD5T/RiMBKLar harduBX6s9FpKeHJTwEYGqBdMgxu1nBQ3wna+Tmmjq5HXxdlzlT7HfQSYzWQxtI2 kXU/1S6kaz1AXVUCsJt42MGbmWhAwCBbVP5RCfHvXB6pulMXyOinRDeoYNzc7HU= =eijX -----END PGP SIGNATURE----- _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
