Hello,

I'm seeing strange behavior from my Maui process.  I am running a cluster of 
2127 nodes.  On the master node, Maui stays at a constant 100% CPU usage.  
There are currently about 750 jobs in the queue, and Maui won't even respond to 
CLI commands (ie, showq, etc) anymore (times out).

Also, I have logging set to "0", but I am still generating thousands of 
"WARNING" and "ERROR" messages in the log.  I am not clear if the messages are 
relevant:

01/25 08:55:58 WARNING:  cannot allocate tasks for job 43251 at   INFINITY
01/25 08:55:58 WARNING:  cannot allocate tasks for job 43251 at   INFINITY
01/25 08:55:58 WARNING:  cannot allocate tasks for job 43251 at   INFINITY
01/25 08:55:58 WARNING:  cannot allocate tasks for job 43251 at   INFINITY
01/25 08:55:58 ERROR:    cannot allocate tasks for job 43251 at any time
01/25 08:55:58 WARNING:  cannot allocate tasks for job 43252 at   INFINITY
01/25 08:55:58 WARNING:  cannot allocate tasks for job 43252 at   INFINITY
01/25 08:55:58 WARNING:  cannot allocate tasks for job 43252 at   INFINITY
01/25 08:55:58 WARNING:  cannot allocate tasks for job 43252 at   INFINITY
01/25 08:55:58 WARNING:  cannot allocate tasks for job 43252 at   INFINITY

Each job generates dozens of these messages.  It seems to me that if there are 
simply not enough CURRENT resources for a job, that shouldn't count as a 
WARNING or ERROR level condition, so I suspect something else is wrong.

Here is our (partial) maui.cfg file.  Thanks for any help anyone can provide...

# maui.cfg 3.2.6p16

SERVERHOST            xlch
# primary admin must be first in list
ADMIN1                root
ADMIN2                disco
ADMIN3                ALL

# Resource Manager Definition

RMCFG[xlch] TYPE=PBS

# Allocation Manager Definition

#AMCFG[bank]  TYPE=NONE

# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration

RMPOLLINTERVAL        00:05:00

NODEPOLLFREQUENCY     3
CLIENTTIMEOUT         00:01:30

SERVERPORT            42559
SERVERMODE            NORMAL
#SERVERMODE             TEST

ENABLEMULTIREQJOBS    TRUE
#USEMACHINESPEED              TRUE

# Admin: http://supercluster.org/mauidocs/a.esecurity.html


LOGFILE               maui.log
LOGFILEMAXSIZE        500000000
LOGLEVEL              0

# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html

QUEUETIMEWEIGHT       0

# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html

#FSPOLICY              PSDEDICATED
#FSDEPTH               7
#FSINTERVAL            86400
#FSDECAY               0.80

# Throttling Policies: 
http://supercluster.org/mauidocs/6.2throttlingpolicies.html

# NONE SPECIFIED

# Backfill: http://supercluster.org/mauidocs/8.2backfill.html

BACKFILLPOLICY        FIRSTFIT
#RESERVATIONPOLICY     CURRENTHIGHEST
RESERVATIONPOLICY       NEVER


# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
#NODEALLOCATIONPOLICY  FASTEST
#NODEALLOCATIONPOLICY  PRIORITY
JOBNODEMATCHPOLICY    EXACTNODE
NODEACCESSPOLICY      SHARED

DEFERTIME             00

# QOS: http://supercluster.org/mauidocs/7.3qos.html

# QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE

# Standing Reservations: 
http://supercluster.org/mauidocs/7.1.3standingreservations.html

# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test]   17:00:00
# SRDAYS[test]      MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test]   0:30:00

# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html

USERCFG[DEFAULT]      FSTARGET=25.0+
# USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
# GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch]       FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR
FSPOLICY              DEDICATEDPS
FSINTERVAL            24:00:00
FSDEPTH               12
FSDECAY               0.5
FSWEIGHT              100
FSUSERWEIGHT          100
FSQOSWEIGHT           0
FSGROUPWEIGHT         0

NODECFG[DEFAULT] PRIORITYF='PRIORITY * 1'


Blake Wickliffe
Saudi Aramco
ENOD/CSYS/USG HPC Team
(873-4417)

The contents of this email, including all related responses, files and 
attachments transmitted with it (collectively referred to as "this Email"), are 
intended solely for the use of the individual/entity to whom/which they are 
addressed, and may contain confidential and/or legally privileged information. 
This Email may not be disclosed or forwarded to anyone else without 
authorization from the originator of this Email. If you have received this 
Email in error, please notify the sender immediately and delete all copies from 
your system. Please note that the views or opinions presented in this Email are 
those of the author and may not necessarily represent those of Saudi Aramco. 
The recipient should check this Email and any attachments for the presence of 
any viruses. Saudi Aramco accepts no liability for any damage caused by any 
virus/error transmitted by this Email.
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to