Hello, For the past few months the Maui service has been crashing with the following error:
11/18 10:24:30 INFO: active PBS job 3538 has been removed from the queue. assuming successful completion 11/18 10:24:30 MJobProcessCompleted(3538) 11/18 10:24:30 MAMAllocJDebit(A,3538,SC,ErrMsg) I'm running Torque-1.3.6 resource manager, Maui 3.2.6p21 on a ROCKS 5.1 cluster. I've attached the output from showconfig. Any thoughts as to why Maui is crashing? # Maui version 3.2.6p21 (PID: 7380) # global policies REJECTNEGPRIOJOBS[0] FALSE ENABLENEGJOBPRIORITY[0] FALSE ENABLEMULTINODEJOBS[0] TRUE ENABLEMULTIREQJOBS[0] TRUE BFPRIORITYPOLICY[0] [NONE] JOBPRIOACCRUALPOLICY QUEUEPOLICY NODELOADPOLICY ADJUSTSTATE USEMACHINESPEED FALSE USESYSTEMQUEUETIME TRUE USELOCALMACHINEPRIORITY FALSE NODEUNTRACKEDLOADFACTOR 1.2 JOBNODEMATCHPOLICY[0] JOBMAXSTARTTIME[0] INFINITY METAMAXTASKS[0] 0 NODESETPOLICY[0] [NONE] NODESETATTRIBUTE[0] [NONE] NODESETLIST[0] NODESETDELAY[0] 00:00:00 NODESETPRIORITYTYPE[0] MINLOSS NODESETTOLERANCE[0] 0.00 BACKFILLPOLICY[0] BESTFIT BACKFILLDEPTH[0] 0 BACKFILLPROCFACTOR[0] 0 BACKFILLMAXSCHEDULES[0] 10000 BACKFILLMETRIC[0] PROCS BFCHUNKDURATION[0] 00:00:00 BFCHUNKSIZE[0] 0 PREEMPTPOLICY[0] REQUEUE MINADMINSTIME[0] 00:00:00 RESOURCELIMITPOLICY[0] NODEAVAILABILITYPOLICY[0] COMBINED:[DEFAULT] NODEALLOCATIONPOLICY[0] FIRSTAVAILABLE TASKDISTRIBUTIONPOLICY[0] DEFAULT RESERVATIONPOLICY[0] NEVER RESERVATIONRETRYTIME[0] 00:00:00 RESERVATIONTHRESHOLDTYPE[0] NONE RESERVATIONTHRESHOLDVALUE[0] 0 FSPOLICY [NONE] FSPOLICY [NONE] FSINTERVAL 12:00:00 FSDEPTH 8 FSDECAY 1.00 # Priority Weights SERVICEWEIGHT[0] 1 TARGETWEIGHT[0] 1 CREDWEIGHT[0] 1 ATTRWEIGHT[0] 1 FSWEIGHT[0] 1 RESWEIGHT[0] 1 USAGEWEIGHT[0] 1 QUEUETIMEWEIGHT[0] 1 XFACTORWEIGHT[0] 0 SPVIOLATIONWEIGHT[0] 0 BYPASSWEIGHT[0] 0 TARGETQUEUETIMEWEIGHT[0] 0 TARGETXFACTORWEIGHT[0] 0 USERWEIGHT[0] 0 GROUPWEIGHT[0] 0 ACCOUNTWEIGHT[0] 0 QOSWEIGHT[0] 0 CLASSWEIGHT[0] 0 FSUSERWEIGHT[0] 0 FSGROUPWEIGHT[0] 0 FSACCOUNTWEIGHT[0] 0 FSQOSWEIGHT[0] 0 FSCLASSWEIGHT[0] 0 ATTRATTRWEIGHT[0] 0 ATTRSTATEWEIGHT[0] 0 NODEWEIGHT[0] 0 PROCWEIGHT[0] 0 MEMWEIGHT[0] 0 SWAPWEIGHT[0] 0 DISKWEIGHT[0] 0 PSWEIGHT[0] 0 PEWEIGHT[0] 0 WALLTIMEWEIGHT[0] 0 UPROCWEIGHT[0] 0 UJOBWEIGHT[0] 0 CONSUMEDWEIGHT[0] 0 USAGEEXECUTIONTIMEWEIGHT[0] 0 REMAININGWEIGHT[0] 0 PERCENTWEIGHT[0] 0 XFMINWCLIMIT[0] 00:02:00 # partition DEFAULT policies REJECTNEGPRIOJOBS[1] FALSE ENABLENEGJOBPRIORITY[1] FALSE ENABLEMULTINODEJOBS[1] TRUE ENABLEMULTIREQJOBS[1] FALSE BFPRIORITYPOLICY[1] [NONE] JOBPRIOACCRUALPOLICY QUEUEPOLICY NODELOADPOLICY ADJUSTSTATE JOBNODEMATCHPOLICY[1] JOBMAXSTARTTIME[1] INFINITY METAMAXTASKS[1] 0 NODESETPOLICY[1] [NONE] NODESETATTRIBUTE[1] [NONE] NODESETLIST[1] NODESETDELAY[1] 00:00:00 NODESETPRIORITYTYPE[1] MINLOSS NODESETTOLERANCE[1] 0.00 # Priority Weights XFMINWCLIMIT[1] 00:00:00 RMAUTHTYPE[0] CHECKSUM CLASSCFG[[NONE]] DEFAULT.FEATURES=[NONE] CLASSCFG[[ALL]] DEFAULT.FEATURES=[NONE] CLASSCFG[short] DEFAULT.FEATURES=[NONE] CLASSCFG[gaussian] DEFAULT.FEATURES=[NONE] CLASSCFG[bsu-research] DEFAULT.FEATURES=[NONE] CLASSCFG[batch] DEFAULT.FEATURES=[NONE] CLASSCFG[gaussbatch] DEFAULT.FEATURES=[NONE] CLASSCFG[atk] DEFAULT.FEATURES=[NONE] CLASSCFG[long] DEFAULT.FEATURES=[NONE] CLASSCFG[routing] DEFAULT.FEATURES=[NONE] CLASSCFG[matlab] DEFAULT.FEATURES=[NONE] CLASSCFG[sequence] DEFAULT.FEATURES=[NONE] QOSPRIORITY[0] 0 QOSQTWEIGHT[0] 0 QOSXFWEIGHT[0] 0 QOSTARGETXF[0] 0.00 QOSTARGETQT[0] 00:00:00 QOSFLAGS[0] QOSPRIORITY[1] 0 QOSQTWEIGHT[1] 0 QOSXFWEIGHT[1] 0 QOSTARGETXF[1] 0.00 QOSTARGETQT[1] 00:00:00 QOSFLAGS[1] QOSPRIORITY[2] 100 QOSQTWEIGHT[2] 0 QOSXFWEIGHT[2] 0 QOSTARGETXF[2] 0.00 QOSTARGETQT[2] 00:00:00 QOSFLAGS[2] PREEMPTOR QOSPRIORITY[3] 500 QOSQTWEIGHT[3] 0 QOSXFWEIGHT[3] 0 QOSTARGETXF[3] 0.00 QOSTARGETQT[3] 00:00:00 QOSFLAGS[3] PREEMPTOR QOSPRIORITY[4] -1000 QOSQTWEIGHT[4] 0 QOSXFWEIGHT[4] 0 QOSTARGETXF[4] 0.00 QOSTARGETQT[4] 00:00:00 QOSFLAGS[4] PREEMPTEE # SERVER MODULES: MX SERVERMODE NORMAL SERVERNAME SERVERHOST ccncluster.bsu.edu SERVERPORT 42559 LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGFILEROLLDEPTH 1 LOGLEVEL 3 LOGFACILITY fALL SERVERHOMEDIR /usr/local/maui/ TOOLSDIR /usr/local/maui/tools/ LOGDIR /usr/local/maui/log/ STATDIR /usr/local/maui/stats/ LOCKFILE /usr/local/maui/maui.pid SERVERCONFIGFILE /usr/local/maui/maui.cfg CHECKPOINTFILE /usr/local/maui/maui.ck CHECKPOINTINTERVAL 00:05:00 CHECKPOINTEXPIRATIONTIME 3:11:20:00 TRAPJOB TRAPNODE TRAPFUNCTION RESDEPTH 24 RMPOLLINTERVAL 00:00:30 NODEACCESSPOLICY SHARED ALLOCLOCALITYPOLICY [NONE] SIMTIMEPOLICY [NONE] ADMIN1 mauiuser root psdavis ADMINHOSTS ALL NODEPOLLFREQUENCY 0 DISPLAYFLAGS DEFAULTDOMAIN DEFAULTCLASSLIST [DEFAULT:1] FEATURENODETYPEHEADER FEATUREPROCSPEEDHEADER FEATUREPARTITIONHEADER DEFERTIME 1:00:00 DEFERCOUNT 24 DEFERSTARTCOUNT 1 JOBPURGETIME 0 NODEPURGETIME 2140000000 APIFAILURETHRESHHOLD 6 NODESYNCTIME 600 JOBSYNCTIME 600 JOBMAXOVERRUN 00:10:00 NODEMAXLOAD 0.0 PLOTMINTIME 120 PLOTMAXTIME 245760 PLOTTIMESCALE 11 PLOTMINPROC 1 PLOTMAXPROC 512 PLOTPROCSCALE 9 SCHEDCFG[] MODE=NORMAL SERVER=ccncluster.bsu.edu:42559 # RM MODULES: PBS SSS WIKI NATIVE RMCFG[base] AUTHTYPE=CHECKSUM EPORT=15004 TIMEOUT=00:00:09 TYPE=PBS SIMWORKLOADTRACEFILE workload SIMRESOURCETRACEFILE resource SIMAUTOSHUTDOWN OFF SIMSTARTTIME 0 SIMSCALEJOBRUNTIME FALSE SIMFLAGS SIMJOBSUBMISSIONPOLICY CONSTANTJOBDEPTH SIMINITIALQUEUEDEPTH 16 SIMWCACCURACY 0.00 SIMWCACCURACYCHANGE 0.00 SIMNODECOUNT 0 SIMNODECONFIGURATION NORMAL SIMWCSCALINGPERCENT 100 SIMCOMRATE 0.10 SIMCOMTYPE ROUNDROBIN COMINTRAFRAMECOST 0.30 COMINTERFRAMECOST 0.30 SIMSTOPITERATION -1 SIMEXITITERATION -1 _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
