Hello, I have a 10 node cluster. There are 3 jobs. 1 which needs 2 nodes ( with 1 task per node ), another which needs 4 nodes (with 1 task per node) and the third one which needs 4 nodes ( with 2 task on 1 node and 1 task each on the other 3 nodes ).
Additional configuration in maui.cfg is : BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST ENABLEMULTIREQJOBS TRUE NODEALLOCATIONPOLICY MINRESOURCE NODEACCESSPOLICY SINGLEJOB JOBNODEMATCHPOLICY EXACTNODE I am observing that if the first 2 jobs are running, the third one does not start ( even though 4 nodes are available ) until 1 of the jobs complete. With checkjob -v <job_id> it shows the following output : ------------------ checking job 5791 (RM job '5791.fire16.csa.local') State: Idle Creds: user:kunal group:kunal class:batch qos:DEFAULT WallTime: 00:00:00 of 00:04:51 SubmitTime: Wed May 23 11:52:04 (Time Queued Total: 00:48:52 Eligible: 00:48:52) StartDate: 00:00:01 Wed May 23 12:40:57 Total Tasks: 2 Req[0] TaskCount: 2 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] Exec: '' ExecSize: 0 ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 NodeAccess: SINGLEJOB TasksPerNode: 2 NodeCount: 1 Req[1] TaskCount: 3 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [NONE] Exec: '' ExecSize: 0 ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 NodeAccess: SINGLEJOB NodeCount: 3 IWD: [NONE] Executable: [NONE] Bypass: 5 StartCount: 0 PartitionMask: [ALL] Flags: RESTARTABLE Reservation '5791' (00:00:01 -> 00:04:52 Duration: 00:04:51) PE: 5.00 StartPriority: 48 cannot select job 5791 for partition DEFAULT (startdate in '00:00:01') ------------ What could be the reason for not starting this job ? How do I resolve this ? Thanks, Kunal
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
