I now have this problem on a different cluster (but again running
torque and maui)
Checkjob for the job gives:
State: Idle EState: Deferred
Creds: user:mcdiypp2 group:nmrc class:med_12h qos:DEFAULT
WallTime: 00:00:00 of 6:00:00
SubmitTime: Thu Dec 11 15:24:45
(Time Queued Total: 00:19:55 Eligible: 00:00:01)
StartDate: -00:19:53 Thu Dec 11 15:24:47
Total Tasks: 32
Req[0] TaskCount: 32 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
Flags: RESTARTABLE
job is deferred. Reason: RMFailure (cannot start job - RM failure,
rc: 15041, msg: 'Execution server rejected request MSG=cannot send job
to mom, state=PRERUN')
Holds: Defer (hold reason: RMFailure)
PE: 32.00 StartPriority: 1
cannot select job 157 for partition DEFAULT (job hold active)
Having looked this up on google, it says it might be a torque problem,
but the basic problem (as I see it) seems to be that two jobs are
assigned to the same set of processors/nodes, and I thought that this
is the job of maui. This has happened previously, and resolved itself
(admittedly while another problem was being sorted)
I have checked the logs on the nodes affected and there is nothing to
say if it even got the job at all!!!
Quoting "Steve Young" <[EMAIL PROTECTED]>:
Hi,
I was looking at the maui manual at:
http://www.clusterresources.com/products/maui/docs/11.1jobholds.shtml
What does checkjob tell you for that job?
-Steve
On Dec 11, 2008, at 9:40 AM, Philip Peartree wrote:
Does anyone have any ideas?
Quoting "Philip Peartree" <[EMAIL PROTECTED]>:
Hi,
I'm having a problem with a torque/maui setup (hence the mail to both
lists). Submitted jobs are being deferred, and this primarily seems to
be because they're all requesting the same resource (node24 at this
point). A qrun seems to shift them onto a correct node.
My pbs_server log suggests that it's being rejected by the mom, and a
look at the logs on the mom shows a rejection going on with code 15004
and the job in unexpected state TRANSICM
Can anyone help?
Phil Peartree
University of Manchester
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers