Re: [Mauiusers] Deferred jobs

Steve Young Thu, 11 Dec 2008 08:40:28 -0800

Hi Phillip,

How about checknode on the node it was trying to run on? Does it seethe node ok? Or possibly pbsnodes -a <nodename> to make sure thattorque is seeing the node properly? I'm just grasping at straw's here=).... if you run releasehold <jobid> does the job run after that?


-Steve

On Dec 11, 2008, at 10:48 AM, Philip Peartree wrote:

I now have this problem on a different cluster (but again runningtorque and maui)
Checkjob for the job gives:

State: Idle  EState: Deferred
Creds:  user:mcdiypp2  group:nmrc  class:med_12h  qos:DEFAULT
WallTime: 00:00:00 of 6:00:00
SubmitTime: Thu Dec 11 15:24:45
 (Time Queued  Total: 00:19:55  Eligible: 00:00:01)

StartDate: -00:19:53  Thu Dec 11 15:24:47
Total Tasks: 32

Req[0]  TaskCount: 32  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       RESTARTABLE
job is deferred. Reason: RMFailure (cannot start job - RMfailure, rc: 15041, msg: 'Execution server rejected requestMSG=cannot send job to mom, state=PRERUN')
Holds:    Defer  (hold reason:  RMFailure)
PE:  32.00  StartPriority:  1
cannot select job 157 for partition DEFAULT (job hold active)
Having looked this up on google, it says it might be a torqueproblem, but the basic problem (as I see it) seems to be that twojobs are assigned to the same set of processors/nodes, and I thoughtthat this is the job of maui. This has happened previously, andresolved itself (admittedly while another problem was being sorted)
I have checked the logs on the nodes affected and there is nothingto say if it even got the job at all!!!
Quoting "Steve Young" <[EMAIL PROTECTED]>:
Hi,
        I was looking at the maui manual at:

http://www.clusterresources.com/products/maui/docs/11.1jobholds.shtml

What does checkjob tell you for that job?

-Steve

On Dec 11, 2008, at 9:40 AM, Philip Peartree wrote:
Does anyone have any ideas?

Quoting "Philip Peartree" <[EMAIL PROTECTED]>:
Hi,
I'm having a problem with a torque/maui setup (hence the mail tobothlists). Submitted jobs are being deferred, and this primarilyseems to
be because they're all requesting the same resource (node24 at this
point). A qrun seems to shift them onto a correct node.
My pbs_server log suggests that it's being rejected by the mom,and alook at the logs on the mom shows a rejection going on with code15004
and the job in unexpected state TRANSICM

Can anyone help?

Phil Peartree
University of Manchester

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers


_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Re: [Mauiusers] Deferred jobs

Reply via email to