hi all,

after reading some more code, it seems you also need to set the (undocumented) parameter
JFACTION=DEFER

stijn

Stijn De Weirdt wrote:
hi all,

we are doing some testing wrt to gold and maui.
(maui snap.1212617145)

one of teh things we can't get to work is the 'job reservation at job start time' policy. (charging when job is finished works, so i'm not suspecting anythig wrong with gold)

when there is a bank failure, jobs start to run no matter what we try.

the maui admin guide states that there is a parameter that can be set DEFERJOBONFAILURE that should deal with this (ie setting to TRUE should keep jobs in state Q). although it is not clear wheter this means any bankfailure or only when the AM can't be reached. but in both cases it doesn't seem to work ;) (logfile extract at the bottom).

what is even more bizarre, when setting this parameter, maui says (loglevel 9):

08/13 16:41:04 INFO:     AMCFG[0] set to DEFERJOBONFAILURE=TRUE
08/13 16:41:04 MUGetIndex(DEFERJOBONFAILURE,ValList,0)
08/13 16:41:04 WARNING:  AM attribute 'DEFERJOBONFAILURE' not handled

i grepped the maui code for anything related and found also a BANKDEFERONJOBFAILURE (mind the subtle difference in naming), which has default value of FALSE. so i changed that defautl to TRUE and rebuild maui, but same result, so maybe it's something else.

hints are welcome.

many thanks,

stijn



logfiles:

from maui.log with loglevel 9:

08/13 17:58:46 ERROR: cannot connect to allocation-manager server 'head1.x.y.z':7112 08/13 17:58:46 MSysRegEvent(RMFAILURE: cannot connect to allocation-manager server head1.x.y.z:7112 (command: '<XML>')
,0,0,1)
08/13 17:58:46 MSysLaunchAction(ASList,1)
08/13 17:58:46 INFO:     scheduler action 1 disabled
08/13 17:58:46 INFO:     command response 'NULL'
08/13 17:58:46 ALERT:    no job data available
08/13 17:58:46 MSUDisconnect(S)
08/13 17:58:46 ALERT:    cannot extract status
08/13 17:58:46 ALERT:    cannot reserve allocation for job
08/13 17:58:46 WARNING: cannot reserve allocation for job '121', reason: BankFailure
08/13 17:58:46 MRMJobStart(121,Msg,SC)
08/13 17:58:46 MPBSJobStart(121,torque,Msg,SC)


08/13 15:10:11 WARNING:  request failed
08/13 15:10:11 ALERT: request failed with status code 740 (Project account8 does not exist)
08/13 15:10:11 MSUDisconnect(S)
08/13 15:10:11 ERROR: cannot receive response from allocation-manager server 'head1.x.y.z':7112 08/13 15:10:11 MSysRegEvent(FAILURE: cannot receive response from allocation-manager server head1.x.y.z:7112 (cmd: '<XML>')
,0,0,1)
08/13 15:10:11 MSysLaunchAction(ASList,1)
08/13 15:10:11 INFO:     command response 'NULL'
08/13 15:10:11 ALERT:    no job data available
08/13 15:10:11 ALERT:    cannot extract status
08/13 15:10:11 ALERT:    cannot reserve allocation for job
08/13 15:10:11 WARNING: cannot reserve allocation for job '107', reason: BankFailure
08/13 15:10:11 MRMJobStart(107,Msg,SC)
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to