There was a similar post earlier : http://www.clusterresources.com/pipermail/mauiusers/2009-July/003930.html
But did not find any response to it. Can anyone please provide some ideas / suggestion on this issue. Thanks, Kunal On Wed, May 23, 2012 at 2:26 PM, Kunal Rao <[email protected]> wrote: > Hello, > > I have a 10 node cluster. There are 3 jobs. 1 which needs 2 nodes ( with 1 > task per node ), another which needs 4 nodes (with 1 task per node) and the > third one which needs 4 nodes ( with 2 task on 1 node and 1 task each on > the other 3 nodes ). > > Additional configuration in maui.cfg is : > > BACKFILLPOLICY FIRSTFIT > RESERVATIONPOLICY CURRENTHIGHEST > > ENABLEMULTIREQJOBS TRUE > NODEALLOCATIONPOLICY MINRESOURCE > NODEACCESSPOLICY SINGLEJOB > JOBNODEMATCHPOLICY EXACTNODE > > I am observing that if the first 2 jobs are running, the third one does > not start ( even though 4 nodes are available ) until 1 of the jobs > complete. With checkjob -v <job_id> it shows the following output : > > ------------------ > > checking job 5791 (RM job '5791.fire16.csa.local') > > State: Idle > Creds: user:kunal group:kunal class:batch qos:DEFAULT > WallTime: 00:00:00 of 00:04:51 > SubmitTime: Wed May 23 11:52:04 > (Time Queued Total: 00:48:52 Eligible: 00:48:52) > > StartDate: 00:00:01 Wed May 23 12:40:57 > Total Tasks: 2 > > Req[0] TaskCount: 2 Partition: ALL > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > Opsys: [NONE] Arch: [NONE] Features: [NONE] > Exec: '' ExecSize: 0 ImageSize: 0 > Dedicated Resources Per Task: PROCS: 1 > NodeAccess: SINGLEJOB > TasksPerNode: 2 NodeCount: 1 > > Req[1] TaskCount: 3 Partition: ALL > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 > Opsys: [NONE] Arch: [NONE] Features: [NONE] > Exec: '' ExecSize: 0 ImageSize: 0 > Dedicated Resources Per Task: PROCS: 1 > NodeAccess: SINGLEJOB > NodeCount: 3 > > > IWD: [NONE] Executable: [NONE] > Bypass: 5 StartCount: 0 > PartitionMask: [ALL] > Flags: RESTARTABLE > > Reservation '5791' (00:00:01 -> 00:04:52 Duration: 00:04:51) > PE: 5.00 StartPriority: 48 > cannot select job 5791 for partition DEFAULT (startdate in '00:00:01') > > ------------ > > What could be the reason for not starting this job ? How do I resolve this > ? > > Thanks, > Kunal >
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
