Hello,
we'd like to submit jobs on a load balanced basis. We set
NODEALLOCATIONPOLICY to CPULOAD.
When we submit a job and there are enough fully free nodes, then
everything is ok, the job is running. But if there aren't enough fully
free nodes - only partially free nodes are available, the job doesn't
run. It is in the idle state.
For example:
A job requests -l nodes=4:ppn=3
Available nodes (every node has 8 CPUs):
Nodes# Free CPUs
r1i1n4# 4
r1i1n5# 3
r1i1n7# 6
r1i1n8# 2
r1i1n9# 2
r1i1n12# 3
r1i1n13# 3
r1i1n15# 3
The requested resources are available, but the job doesn't run.
The output of checkjob 8970:
State: Idle
Creds: user:black group:users class:batch qos:DEFAULT
WallTime: 00:00:00 of 41:16:00:00
SubmitTime: Thu Jun 11 15:01:53
(Time Queued Total: 00:16:26 Eligible: 00:16:26)
StartDate: 00:00:01 Thu Jun 11 15:18:20
Total Tasks: 12
Req[0] TaskCount: 12 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [batch]
IWD: [NONE] Executable: NONE]
Bypass: 0 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
Reservation '8970' (00:00:01 -> 41:16:00:01 Duration: 41:16:00:00)
PE: 12.00 StartPriority: 16
cannot select job 8970 for partition DEFAULT (startdate in '00:00:01')
We changed NODEALLOCATIONPOLICY from CPULOAD to MINRESOURCES and the job
started immediately. But we'd like to use load balance.
Any suggestion?
Thank you.
Best regards
Jana Uhlirova
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers