Dear all,

Michel Béland posted a similar question onto the torqueusers mailing list 
already last September 
(http://www.clusterresources.com/pipermail/torqueusers/2015-September/018281.html),
 but it seems that there is no answer yet. I have the same problem now: Torque 
5.1.1 uses a different syntax now for exec_host and Maui cannot resolve this, 
i.e. a job like this (submitted with -l nodes=1:ppn=16):

Job ID                  Username    Queue    Jobname          SessID  NDS   TSK 
  Memory   Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- 
------ ------ --------- - ---------
239495.hydra3           user1     fwi      vasp-test          4826     1     16 
   --  400:00:00 R  07:49:52

is seen by Maui as a job using only one CPU:

checking node ion033

State:   Running  (in current state for 00:00:00)
Expected State:     Idle   SyncDeadline: Sat Oct 24 14:26:40
Configured Resources: PROCS: 32  MEM: 251G  SWAP: 251G  DISK: 1M
Utilized   Resources: SWAP: 21G
Dedicated  Resources: PROCS: 1
Opsys:        ubuntu  Arch:         x64
Speed:      1.00  Load:      16.000
Network:    [DEFAULT]
Features:   [NONE]
Attributes: [Batch]
Classes:    [default 32:32][fwd 32:32][fwi 31:32][short 32:32][long 
32:32][benchmark 32:32]

Total Time: 25:08:23:38  Up: 25:08:23:38 (100.00%)  Active: 12:08:21:18 (48.71%)

Reservations:
  Job '239495'(x1)  -7:53:02 -> 16:08:06:58 (16:16:00:00)
  User 'ion.0.0'(x1)  -1:29:36 ->   INFINITY (  INFINITY)
    Blocked Resources@00:00:00    Procs: 31/32 (96.88%)
    Blocked Resources@16:08:06:58 Procs: 32/32 (100.00%)
JobList:  239495

But Torque knows the right numbers:

pbsnodes ion033
ion033
     state = free
     power_state = Running
     np = 32
     ntype = cluster
     jobs = 0-15/239495.hydra3
     status = 
rectime=1456929826,macaddr=0c:c4:7a:51:59:7a,cpuclock=OnDemand:2301MHz,varattr=,jobs=239495.hydra3(cput=461367,energy_used=0,mem=14053384kb,vmem=20332124kb,walltime=28936,session_id=4826),state=free,netload=155302554882,gres=,loadave=16.00,ncpus=32,physmem=264049812kb,availmem=241724944kb,totmem=264049812kb,idletime=2253059,nusers=1,nsessions=1,sessions=4826,uname=Linux
 ion033 3.10.61 #2 SMP Mon Dec 1 11:42:57 CET 2014 x86_64,opsys=ubuntu,arch=x64
     mom_service_port = 15002
     mom_manager_port = 15003

Any other job submitted onto N CPUs of this node (N greater than the number of 
free CPUs, but smaller than the number of CPUs of the node) will be assigned to 
this node by Maui, but is rejected by Torque with something like:

PBS_Server.27269;Job;239534.hydra3;could not locate requested resources 
'ion033:ppn=24' (node_spec failed) job allocation request exceeds currently 
available cluster nodes, 1 requested, 0 available

and goes to state 'Deferred‘.

Is there any chance to change back to the old Torque syntax or is it possible 
to modify Maui to understand this correctly? Or is there any other chance to 
resolve the problem somehow?

Btw: any kind of answer like ‚You should use Moab instead of Maui‘ or ‚You 
should switch to SLURM’ does not really help.

Thanks a lot in advance,
Henrik

--
Dr. Henrik Schulz
Helmholtz-Zentrum Dresden-Rossendorf








_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to