Hello, I'm seeing strange behavior from my Maui process. I am running a cluster of 2127 nodes. On the master node, Maui stays at a constant 100% CPU usage. There are currently about 750 jobs in the queue, and Maui won't even respond to CLI commands (ie, showq, etc) anymore (times out).
Also, I have logging set to "0", but I am still generating thousands of "WARNING" and "ERROR" messages in the log. I am not clear if the messages are relevant: 01/25 08:55:58 WARNING: cannot allocate tasks for job 43251 at INFINITY 01/25 08:55:58 WARNING: cannot allocate tasks for job 43251 at INFINITY 01/25 08:55:58 WARNING: cannot allocate tasks for job 43251 at INFINITY 01/25 08:55:58 WARNING: cannot allocate tasks for job 43251 at INFINITY 01/25 08:55:58 ERROR: cannot allocate tasks for job 43251 at any time 01/25 08:55:58 WARNING: cannot allocate tasks for job 43252 at INFINITY 01/25 08:55:58 WARNING: cannot allocate tasks for job 43252 at INFINITY 01/25 08:55:58 WARNING: cannot allocate tasks for job 43252 at INFINITY 01/25 08:55:58 WARNING: cannot allocate tasks for job 43252 at INFINITY 01/25 08:55:58 WARNING: cannot allocate tasks for job 43252 at INFINITY Each job generates dozens of these messages. It seems to me that if there are simply not enough CURRENT resources for a job, that shouldn't count as a WARNING or ERROR level condition, so I suspect something else is wrong. Here is our (partial) maui.cfg file. Thanks for any help anyone can provide... # maui.cfg 3.2.6p16 SERVERHOST xlch # primary admin must be first in list ADMIN1 root ADMIN2 disco ADMIN3 ALL # Resource Manager Definition RMCFG[xlch] TYPE=PBS # Allocation Manager Definition #AMCFG[bank] TYPE=NONE # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html # use the 'schedctl -l' command to display current configuration RMPOLLINTERVAL 00:05:00 NODEPOLLFREQUENCY 3 CLIENTTIMEOUT 00:01:30 SERVERPORT 42559 SERVERMODE NORMAL #SERVERMODE TEST ENABLEMULTIREQJOBS TRUE #USEMACHINESPEED TRUE # Admin: http://supercluster.org/mauidocs/a.esecurity.html LOGFILE maui.log LOGFILEMAXSIZE 500000000 LOGLEVEL 0 # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html QUEUETIMEWEIGHT 0 # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html #FSPOLICY PSDEDICATED #FSDEPTH 7 #FSINTERVAL 86400 #FSDECAY 0.80 # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html # NONE SPECIFIED # Backfill: http://supercluster.org/mauidocs/8.2backfill.html BACKFILLPOLICY FIRSTFIT #RESERVATIONPOLICY CURRENTHIGHEST RESERVATIONPOLICY NEVER # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html #NODEALLOCATIONPOLICY FASTEST #NODEALLOCATIONPOLICY PRIORITY JOBNODEMATCHPOLICY EXACTNODE NODEACCESSPOLICY SHARED DEFERTIME 00 # QOS: http://supercluster.org/mauidocs/7.3qos.html # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE # Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html # SRSTARTTIME[test] 8:00:00 # SRENDTIME[test] 17:00:00 # SRDAYS[test] MON TUE WED THU FRI # SRTASKCOUNT[test] 20 # SRMAXTIME[test] 0:30:00 # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html USERCFG[DEFAULT] FSTARGET=25.0+ # USERCFG[john] PRIORITY=100 FSTARGET=10.0- # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi # CLASSCFG[batch] FLAGS=PREEMPTEE # CLASSCFG[interactive] FLAGS=PREEMPTOR FSPOLICY DEDICATEDPS FSINTERVAL 24:00:00 FSDEPTH 12 FSDECAY 0.5 FSWEIGHT 100 FSUSERWEIGHT 100 FSQOSWEIGHT 0 FSGROUPWEIGHT 0 NODECFG[DEFAULT] PRIORITYF='PRIORITY * 1' Blake Wickliffe Saudi Aramco ENOD/CSYS/USG HPC Team (873-4417) The contents of this email, including all related responses, files and attachments transmitted with it (collectively referred to as "this Email"), are intended solely for the use of the individual/entity to whom/which they are addressed, and may contain confidential and/or legally privileged information. This Email may not be disclosed or forwarded to anyone else without authorization from the originator of this Email. If you have received this Email in error, please notify the sender immediately and delete all copies from your system. Please note that the views or opinions presented in this Email are those of the author and may not necessarily represent those of Saudi Aramco. The recipient should check this Email and any attachments for the presence of any viruses. Saudi Aramco accepts no liability for any damage caused by any virus/error transmitted by this Email. _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
