Here is a bit from the maui.log file of a scheduling run where it did not start:
06/23 10:22:55 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE) 06/23 10:22:55 INFO: total jobs selected in partition ALL: 1/1 06/23 10:22:55 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE) 06/23 10:22:55 INFO: total jobs selected in partition DEFAULT: 1/1 06/23 10:22:55 MQueueScheduleIJobs(Q,DEFAULT) 06/23 10:22:55 INFO: 180 feasible tasks found for job 42433:0 in partition DEFAULT (20 Needed) 06/23 10:22:55 ALERT: inadequate tasks to allocate to job 42433:0 (4 < 20) 06/23 10:22:55 ERROR: cannot allocate nodes to job '42433' in partition DEFAULT 06/23 10:22:55 MJobPReserve(42433,DEFAULT,ResCount,ResCountRej) 06/23 10:22:55 MJobReserve(42433,Priority) 06/23 10:22:55 INFO: 180 feasible tasks found for job 42433:0 in partition DEFAULT (20 Needed) 06/23 10:22:55 INFO: 180 feasible tasks found for job 42433:0 in partition DEFAULT (20 Needed) 06/23 10:22:55 INFO: located resources for 20 tasks (140) in best partition DEFAULT for job 42433 at time 00:00:01 06/23 10:22:55 INFO: tasks located for job 42433: 20 of 20 required (140 feasible) 06/23 10:22:55 MJobDistributeTasks(42433,SCFS.*FQDN*,NodeList,TaskMap) 06/23 10:22:55 MResJCreate(42433,MNodeList,00:00:01,Priority,Res) 06/23 10:22:55 INFO: job '42433' reserved 20 tasks (partition DEFAULT) to start in 00:00:01 on Mon Jun 23 10:22:56 Here is the one where it ran 2 minutes later (it had been submitted almost 24 hours before. 06/23 10:24:05 MStatClearUsage([NONE],Idle) 06/23 10:24:05 INFO: total jobs selected (ALL): 1/12 [State: 11] 06/23 10:24:05 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE) 06/23 10:24:05 INFO: total jobs selected in partition ALL: 1/1 06/23 10:24:05 MQueueScheduleRJobs(Q) 06/23 10:24:05 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE) 06/23 10:24:05 INFO: total jobs selected in partition ALL: 1/1 06/23 10:24:05 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE) 06/23 10:24:05 INFO: total jobs selected in partition DEFAULT: 1/1 06/23 10:24:05 MQueueScheduleIJobs(Q,DEFAULT) 06/23 10:24:05 INFO: 180 feasible tasks found for job 42433:0 in partition DEFAULT (20 Needed) 06/23 10:24:05 INFO: tasks located for job 42433: 20 of 20 required (67 feasible) 06/23 10:24:05 MJobStart(42433) 06/23 10:24:05 MJobDistributeTasks(42433,SCFS.PITT.PENN.SEAGATE.COM ,NodeList,TaskMap) 06/23 10:24:05 MAMAllocJReserve(42433,RIndex,ErrMsg) 06/23 10:24:05 MRMJobStart(42433,Msg,SC) 06/23 10:24:05 MPBSJobStart(42433,SCFS.PITT.PENN.SEAGATE.COM,Msg,SC) 06/23 10:24:05 MPBSJobModify(42433,Resource_List,Resource,sc45:ppn=4+sc44:ppn=4+sc43:ppn=4+sc35:ppn=4+sc32:ppn=4) 06/23 10:24:05 MPBSJobModify(42433,Resource_List,Resource,20:ib) 06/23 10:24:05 INFO: job '42433' successfully started 06/23 10:24:05 MStatUpdateActiveJobUsage(42433) 06/23 10:24:05 MResJCreate(42433,MNodeList,00:00:00,ActiveJob,Res) 06/23 10:24:05 INFO: starting job '42433' 06/23 10:24:05 INFO: 1 jobs started on iteration 1378 There was a single other job running initial that was using 40 slots on 10 nodes (out of 47). There were other processes running on the nodes outside of torque/maui but when counting by hand we found that there were more than 5 nodes with a load less than 4 so there should have been enough available for th job to run. Just before it ran I had loaded up a number of single process jobs to see if they would be schedualed and it schedualed and ran all 10 of them without a problem and then in the same iteration job 42433 ran. >From maui.cfg We have an entry as follows for each node though some have lower limits because they run software outside the queue. NODECFG[sc01] MAXLOAD=4.0 We also have the following in the file: USERCFG[DEFAULT] MAXJOB=150,200 NODEALLOCATIONPOLICY CPULOAD NODELOADPOLICY ADJUSTSTATE BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST QUEUETIMEWEIGHT 1 All the nodes are identical dual dualcore cpus. Any thoughts or suggestions are appreciated. Thanks, Rob
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
