I run maui and slurm 1.3.6 . I found that in maui log there are errors and alerts: 11/03 23:56:40 ERROR: command 'CMD=GETNODES ARG=0:ALL' SC: -300 response: 'NONE' 11/03 23:56:40 ALERT: cannot get node list from WIKI RM 11/03 23:56:40 ALERT: cannot load cluster resources on RM (RM 'p6ihopenhpc-ib-3' failed in function 'clusterquery') 11/03 23:56:40 WARNING: no resources detected
Can someone tell what's wrong with the config of maui and slurm? file maui.cfg: ------------------------------------- # maui.cfg 3.2.6p20 SERVERHOST p6ihopenhpc-ib-3 # primary admin must be first in list ADMIN1 root # Resource Manager Definition RMCFG[p6ihopenhpc-ib-3] TYPE=WIKI RMPORT 7321 RMHOST p6ihopenhpc-ib-3 RMAUTHTYPE[p6ihopenhpc-ib-3] MUNGE # Allocation Manager Definition AMCFG[bank] TYPE=NONE # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html # use the 'schedctl -l' command to display current configuration RMPOLLINTERVAL 00:00:20 SERVERPORT 42559 SERVERMODE NORMAL # Admin: http://supercluster.org/mauidocs/a.esecurity.html LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGLEVEL 3 # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html QUEUETIMEWEIGHT 1 # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html #FSPOLICY PSDEDICATED #FSDEPTH 7 #FSINTERVAL 86400 #FSDECAY 0.80 # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html # NONE SPECIFIED # Backfill: http://supercluster.org/mauidocs/8.2backfill.html BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html NODEALLOCATIONPOLICY MINRESOURCE # QOS: http://supercluster.org/mauidocs/7.3qos.html # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE # Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html # SRSTARTTIME[test] 8:00:00 # SRENDTIME[test] 17:00:00 # SRDAYS[test] MON TUE WED THU FRI # SRTASKCOUNT[test] 20 # SRMAXTIME[test] 0:30:00 # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html # USERCFG[DEFAULT] FSTARGET=25.0 # USERCFG[john] PRIORITY=100 FSTARGET=10.0- # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi # CLASSCFG[batch] FLAGS=PREEMPTEE # CLASSCFG[interactive] FLAGS=PREEMPTOR PARTITIONMODE ON NODECFG[p6ihopenhpc-ib-3] PARTITION=debug NODECFG[p6ihopenhpc-ib-4] PARTITION=debug NODECFG[p6ihopenhpc-ib-5] PARTITION=debug NODECFG[p6ihopenhpc-ib-6] PARTITION=debug ==================================================== File slurm.conf: ------------------------------------- # slurm.conf file generated by configurator.html. # See the slurm.conf man page for more information. # ControlMachine=p6ihopenhpc-ib-3 ControlAddr=10.2.1.30 BackupController=p6ihopenhpc-ib-1 BackupAddr=10.2.1.10 # AuthType=auth/munge #AuthType=auth/none CacheGroups=0 #CheckpointType=checkpoint/none #CryptoType=crypto/openssl CryptoType=crypto/munge #Epilog= #FirstJobId=1 JobCredentialPrivateKey=/etc/slurm/slurm.key JobCredentialPublicCertificate=/etc/slurm/slurm.cert #JobFileAppend=0 #JobRequeue=1 #MailProg=/bin/mail #MaxJobCount=5000 MpiDefault=none #PluginDir= #PlugStackConfig= #PrivateData=0 ProctrackType=proctrack/pgid #Prolog= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/tmp/slurmd SlurmUser=slurm #SrunEpilog= #SrunProlog= StateSaveLocation=/tmp SwitchType=switch/none #TaskEpilog= TaskPlugin=task/none #TaskPluginParam= #TaskProlog= #TmpFs=/tmp #TreeWidth= #UnkillableStepProgram= #UnkillableStepTimeout= #UsePAM=0 # # # TIMERS #EpilogMsgTime=2000 #GetEnvTimeout=2 #HealthCheckInterval=0 #HealthCheckProgram= InactiveLimit=0 MinJobAge=300 KillWait=30 #MessageTimeout=10 SlurmctldTimeout=300 SlurmdTimeout=300 #UnkillableStepProgram= #UnkillableStepTimeout=60 Waittime=0 # # # SCHEDULING #DefMemPerTask=0 FastSchedule=1 #MaxMemPerTask=0 #SchedulerRootFilter=1 #SchedulerTimeSlice=30 #SchedulerType=sched/backfill SchedulerType=sched/wiki SchedulerPort=7321 SelectType=select/linear #SelectTypeParameters= # # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 #AccountingStorageHost= #AccountingStorageLoc= #AccountingStoragePass= #AccountingStoragePort= #AccountingStorageType=jobacct_storage/none #AccountingStorageUser= ClusterName=cluster #JobCompHost= #JobCompLoc= #JobCompPass= #JobCompPort= JobCompType=jobcomp/none #JobCompUser= #JobAcctGatherFrequency= #JobAcctGatherType=jobacct_gather/none SlurmctldDebug=3 SlurmctldLogFile=/tmp/slurm/slurmctld.log SlurmdDebug=3 SlurmdLogFile=/tmp/slurm/slurmd.log # # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # # COMPUTE NODES NodeName=p6ihopenhpc-ib-[3-6] Procs=1 State=UNKNOWN PartitionName=debug Nodes=p6ihopenhpc-ib-[3-6] Default=YES MaxTime=INFINITE State=UP ------------------------------------- Regards, Hien Nguyen Linux Technology Center (Austin) Phone: (512) 838-4140 Tie Line: 678-4140 e-mail: [EMAIL PROTECTED]
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
