Hi all,
I have three types of queues defined on my cluster (x86-64, SLES9SP3,
torque120p6,maui326p14-snap1129921819) - 'cpu-2' ==> dual-opterons, 'cpu-8'
==> octuple-opteron and 'dualcore' ==> dual-core dual-opteron.
Everthings runs fine until the cpu-2 queue gets full with jobs and some
'cpu-2' jobs are queued. In this situation no pending 'dualcore' job
starts even though there is an idle cpu at that machine.
Has anybody an idea, what I configured wrong, any help is apreciated,
thanks in advance,
Thomas Dargel.
Attached: outputs from 'checkjob 6010' (the queued job),
'checknode node24' (the dualcore machine)
and a snippet of the logfile (LOGLEVEL 6).
I will provide more information, if needed.
--
--------------------------------------------------------------------------------
Thomas Dargel Raum: 3'325 Tel.: +(49)30 2093-7143/4
Humboldt-Universitaet zu Berlin Fax.: +(49)30 2093-7136
Institut fuer Chemie
AG Quantenchemie, Prof. Sauer
Brook-Taylor-Str. 2 Mail: td AT chemie.hu-berlin.de
D-12489 Berlin - Adlershof
--------------------------------------------------------------------------------
03/03 11:09:28 MReqCheckResourceMatch(BFWindow,0,node24,RIndex)
03/03 11:09:28 INFO: node node24 can provide resources for job BFWindow:0
03/03 11:09:28
MJobCheckNRes(BFWindow,node24,RQ[0],00:00:00,TCAvail,1.000,RIndex,Affinity,FeasCheck)
03/03 11:09:28 MReqCheckResourceMatch(BFWindow,0,node24,RIndex)
03/03 11:09:28 INFO: node node24 can provide resources for job BFWindow:0
03/03 11:09:28
MJobCheckNStartTime(BFWindow,RQ,node24,00:00:00,TasksAllowed,1.000000,RIndex,Affinity)
03/03 11:09:28 MRECheck(node24,MJobGetSNRange-Start,FORCE)
03/03 11:09:28 INFO: resources available at time -1:22:24:30 during 6006
start
03/03 11:09:28 INFO: adjusting 'preactive' ARange[0] taskcount from 3 to 2
03/03 11:09:28 INFO: adjusting 'preactive' ARange[0] taskcount from 2 to 1
03/03 11:09:28 INFO: ARange[1] (1149853497 -> 1149878887)x2 too late for
job BFWindow by 8472929
03/03 11:09:28 INFO: ARange[2] (1149878887 -> 1149944218)x3 too late for
job BFWindow by 8498319
03/03 11:09:28 INFO: ARange[3] (1149944218 -> 2140000000)x4 too late for
job BFWindow by 8563650
03/03 11:09:28 INFO: node node24 supports 1 task of job BFWindow:0 for
98:08:38:39 at 00:00:00
03/03 11:09:28 MRECheck(node24,MJobGetSNRange-Start,FORCE)
03/03 11:09:28 INFO: resources available at time -1:22:24:30 during 6006
start
03/03 11:09:28 INFO: adjusting 'preactive' ARange[0] taskcount from 3 to 2
03/03 11:09:28 INFO: adjusting 'preactive' ARange[0] taskcount from 2 to 1
03/03 11:09:28 INFO: ARange[1] (1149853497 -> 1149878887)x2 too late for
job BFWindow by 8472929
03/03 11:09:28 INFO: ARange[2] (1149878887 -> 1149944218)x3 too late for
job BFWindow by 8498319
03/03 11:09:28 INFO: ARange[3] (1149944218 -> 2140000000)x4 too late for
job BFWindow by 8563650
03/03 11:09:28 INFO: node node24 supports 1 task of job BFWindow:0 for
98:08:38:39 at 00:00:00
03/03 11:09:28 INFO: backfill window: time: INFINITY nodes: 0 tasks:
0 mintime: 8498319 (idle nodes: 0)
03/03 11:09:28 MPolicyAdjustUsage(NULL,6074,NULL,idle,PU,[ALL],-1,NULL)
checking node node24
State: Running (in current state for 00:00:00)
Configured Resources: PROCS: 4 MEM: 7968M SWAP: 7968M DISK: 1M
Utilized Resources: [NONE]
Dedicated Resources: PROCS: 3 MEM: 5970M
Opsys: linux Arch: [NONE]
Speed: 1.00 Load: 2.990
Location: Partition: DEFAULT Frame/Slot: 1/1
Network: [DEFAULT]
Features: [dc]
Attributes: [Batch]
Classes: [cpu-2 4:4][cpu-8 4:4][mixpipe 4:4][dualcore 1:4]
Total Time: 76:02:55:49 Up: 69:06:05:46 (90.98%) Active: 45:07:30:38 (59.53%)
Reservations:
Job '6008'(x1) -1:15:25:51 -> 98:08:34:08 (99:23:59:59)
Job '6006'(x1) -1:22:29:01 -> 98:01:30:58 (99:23:59:59)
Job '6009'(x1) -21:17:00 -> 99:02:42:59 (99:23:59:59)
JobList: 6006,6008,6009
checking job 6010 (RM job '6010.cnode01.mauicluster')
State: Idle
Creds: user:jd group:qc class:dualcore qos:DEFAULT
WallTime: 00:00:00 of 99:23:59:59
SubmitTime: Wed Mar 1 10:00:06
(Time Queued Total: 2:01:12:53 Eligible: 2:01:12:53)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [dc]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1 MEM: 1990M
NodeAccess: SHARED
TasksPerNode: 1 NodeCount: 1
IWD: [NONE] Executable: [NONE]
Bypass: 13 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
PE: 1.00 StartPriority: 97387
job can run in partition DEFAULT (1 procs available. 1 procs required)
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers