I manage a moderate cluster running SLC5.5 which uses maui and pbs.

After upgrading the head node to SLC5.5 and upgrading torque the batch
system is behaving oddly.

Our 58 worker nodes are all identical 2x dual quad core boxes, i.e. they
possess 8 cores each. For some reason, jobs will only be scheduled until 7
of the 8 cores are being used. Checking one of the queued jobs, I find that
it is not being scheduled because it claims that there are no free CPUs:

[root@######### server_priv]# checkjob -v 24933

checking job 24933 (RM job '24933.###########')

State: Idle
Creds:  user:szczypka  group:######  class:long  qos:DEFAULT
WallTime: 00:00:00 of 83:08:00:00
SubmitTime: Thu Feb 10 16:39:36
  (Time Queued  Total: 00:04:07  Eligible: 00:04:07)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1  MEM: 1000M
NodeAccess: SHARED
NodeCount: 0

IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE

PE:  1.00  StartPriority:  1
job cannot run in partition DEFAULT (idle procs do not meet requirements : 0
of 1 procs found)
idle procs: 464  feasible procs:   0

Rejection Reasons: [CPU          :   58]

Detailed Node Availability Information:

n01                      rejected : CPU
n02                      rejected : CPU
...
n57                      rejected : CPU
n58                      rejected : CPU

Yet this is clearly not the case.

Removing all the default resource requirements has no effect.

Interestingly, should I flood the cluster with jobs requiring 2, 4 or 8
processors (e.g. qsub -l nodes=1:ppn=8) then the jobs will fill the cluster
entirely.

Below is our maui.cfg:

"""
SERVERHOST            ##########
ADMIN1                root
ADMIN3                ALL

RMCFG[############] TYPE=PBS

AMCFG[bank]  TYPE=NONE

RMPOLLINTERVAL        00:00:10

SERVERPORT            42559
SERVERMODE            NORMAL

LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3

FSPOLICY             DEDICATEDPES
FSDEPTH              30
FSINTERVAL           2:00:00
FSDECAY              0.80
FSWEIGHT             500
FSUSERWEIGHT         10

JOBPRIOACCRUALPOLICY ALWAYS

XFACTORWEIGHT        3
XFWEIGHT             7
XFCAP                1000000
XFMINWCLIMIT         0:01:00

BACKFILLPOLICY       BESTFIT

RESERVATIONPOLICY     CURRENTHIGHEST

NODEALLOCATIONPOLICY PRIORITY
NODECFG[DEFAULT]     PRIORITYF='-1 * JOBCOUNT'

USERWEIGHT           1

USERCFG[DEFAULT]     PRIORITY=10000
USERCFG[DEFAULT]     FSTARGET=10.0

CLASSCFG[data]       PRIORITY=100000
CLASSCFG[align]      PRIORITY=100000
CLASSCFG[data5]      PRIORITY=100000
CLASSCFG[dirac]      MAXJOB=5
"""



Does anyone have any advice or thoughts on what might be causing this?

Thanks,

Paul.
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to