Hi All,
I am using torque-2.0.0p2 and maui-3.2.6p13, and notice the following
behavior today:
- There are several jobs in the queue that are in the Q state. When I do
checkjob <jobid>, I get (among other things):
"job can run in partition DEFAULT (63 procs available. 1 procs required)"
but the job remains in Q forever. It is not the case of a resource
requirement not being met (as the above message indicates)
- nothing untoward in the torque logs
- I see several of these messages in maui.log:
MSysRegEvent(JOBCORRUPTION: job 'jobid' has the following idle node(s)
allocated: 'node114' ,0,0,1)
but these are for the running jobs, not the Q'ed jobs in question
- I also see messages like these in the maui.log:
INFO: PBS node node114 set to state Idle (free)
INFO: node 'node114' changed states from Running to Idle
although, this node has 2 out of 4 procs busy
this message is repeated for several nodes.
- restarting torque and maui did not help either
- if I say qrun <jobid> for the stuck jobs, I get:
qrun: Resource temporarily unavailable <jobid>
- but if I do runjob <jobid>, the jobs are started !!
I am unable to correlate all this information. Does anyone know what can
be going wrong, or where else can I hunt for things?
Thanks.
-Neel
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers