Dear all,
Michel Béland posted a similar question onto the torqueusers mailing list
already last September
(http://www.clusterresources.com/pipermail/torqueusers/2015-September/018281.html),
but it seems that there is no answer yet. I have the same problem now: Torque
5.1.1 uses a different syntax now for exec_host and Maui cannot resolve this,
i.e. a job like this (submitted with -l nodes=1:ppn=16):
Job ID Username Queue Jobname SessID NDS TSK
Memory Time S Time
----------------------- ----------- -------- ---------------- ------ -----
------ ------ --------- - ---------
239495.hydra3 user1 fwi vasp-test 4826 1 16
-- 400:00:00 R 07:49:52
is seen by Maui as a job using only one CPU:
checking node ion033
State: Running (in current state for 00:00:00)
Expected State: Idle SyncDeadline: Sat Oct 24 14:26:40
Configured Resources: PROCS: 32 MEM: 251G SWAP: 251G DISK: 1M
Utilized Resources: SWAP: 21G
Dedicated Resources: PROCS: 1
Opsys: ubuntu Arch: x64
Speed: 1.00 Load: 16.000
Network: [DEFAULT]
Features: [NONE]
Attributes: [Batch]
Classes: [default 32:32][fwd 32:32][fwi 31:32][short 32:32][long
32:32][benchmark 32:32]
Total Time: 25:08:23:38 Up: 25:08:23:38 (100.00%) Active: 12:08:21:18 (48.71%)
Reservations:
Job '239495'(x1) -7:53:02 -> 16:08:06:58 (16:16:00:00)
User 'ion.0.0'(x1) -1:29:36 -> INFINITY ( INFINITY)
Blocked Resources@00:00:00 Procs: 31/32 (96.88%)
Blocked Resources@16:08:06:58 Procs: 32/32 (100.00%)
JobList: 239495
But Torque knows the right numbers:
pbsnodes ion033
ion033
state = free
power_state = Running
np = 32
ntype = cluster
jobs = 0-15/239495.hydra3
status =
rectime=1456929826,macaddr=0c:c4:7a:51:59:7a,cpuclock=OnDemand:2301MHz,varattr=,jobs=239495.hydra3(cput=461367,energy_used=0,mem=14053384kb,vmem=20332124kb,walltime=28936,session_id=4826),state=free,netload=155302554882,gres=,loadave=16.00,ncpus=32,physmem=264049812kb,availmem=241724944kb,totmem=264049812kb,idletime=2253059,nusers=1,nsessions=1,sessions=4826,uname=Linux
ion033 3.10.61 #2 SMP Mon Dec 1 11:42:57 CET 2014 x86_64,opsys=ubuntu,arch=x64
mom_service_port = 15002
mom_manager_port = 15003
Any other job submitted onto N CPUs of this node (N greater than the number of
free CPUs, but smaller than the number of CPUs of the node) will be assigned to
this node by Maui, but is rejected by Torque with something like:
PBS_Server.27269;Job;239534.hydra3;could not locate requested resources
'ion033:ppn=24' (node_spec failed) job allocation request exceeds currently
available cluster nodes, 1 requested, 0 available
and goes to state 'Deferred‘.
Is there any chance to change back to the old Torque syntax or is it possible
to modify Maui to understand this correctly? Or is there any other chance to
resolve the problem somehow?
Btw: any kind of answer like ‚You should use Moab instead of Maui‘ or ‚You
should switch to SLURM’ does not really help.
Thanks a lot in advance,
Henrik
--
Dr. Henrik Schulz
Helmholtz-Zentrum Dresden-Rossendorf
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers