I still didn't get any answer on this... let me attach the log and
configs here.
The question is:
I have everything working fine, so why Maui reports many errors like
"cannot query events on RM" ?
canceljob, showstart, showq, all Maui commands work fine, I can manage
my SLURM jobs good.
Thanks in advance.
Rafael
On Mon, 2008-11-17 at 15:54 -0200, Rafael Folco wrote:
> Hi all,
>
> I have Maui/SLURM working fine and I am able to run SLURM jobs with srun
> command. However, I see lots of errors at the maui.log file.
>
> 11/17 10:42:41 MRMCheckEvents()
> 11/17 10:42:41 ALERT: cannot query events on RM (RM 'cluster-ib-1'
> does not support function 'rmeventquery')
> 11/17 10:42:41 MSUAcceptClient(5,ClientSD,HostName,TCP)
> 11/17 10:42:41 INFO: accept call failed, errno: 11 (Resource
> temporarily unavailable)
> 11/17 10:42:41 INFO: all clients connected. servicing requests
>
> Any clue?
>
> Thanks,
>
> Rafael
--
Rafael Folco
Brazil Test Lead
IBM Linux Technology Center
E-Mail: [EMAIL PROTECTED]
# maui.cfg 3.2.6p20
SERVERHOST cluster1
# primary admin must be first in list
ADMIN1 root slurm
# Resource Manager Definition
RMCFG[cluster1] TYPE=WIKI
RMPORT 7321
RMHOST cluster1
RMAUTHTYPE[cluster1] NONE
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL 00:00:05
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 9
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
QUEUETIMEWEIGHT 1
# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
#FSPOLICY PSDEDICATED
#FSDEPTH 7
#FSINTERVAL 86400
#FSDECAY 0.80
# Throttling Policies:
http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
NODEALLOCATIONPOLICY MINRESOURCE
# QOS: http://supercluster.org/mauidocs/7.3qos.html
# QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
# Standing Reservations:
http://supercluster.org/mauidocs/7.1.3standingreservations.html
# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test] 17:00:00
# SRDAYS[test] MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test] 0:30:00
# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
# USERCFG[DEFAULT] FSTARGET=25.0
# USERCFG[john] PRIORITY=100 FSTARGET=10.0-
# GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch] FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR
PARTITIONMODE ON
NODECFG[cluster1] PARTITION=openhpc
NODECFG[cluster2] PARTITION=openhpc
NODECFG[cluster3] PARTITION=openhpc
NODECFG[cluster4] PARTITION=openhpc
NODECFG[cluster5] PARTITION=openhpc
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=cluster1
ControlAddr=10.1.1.10
#BackupController=
#BackupAddr=
#
AuthType=auth/munge
CacheGroups=0
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#FirstJobId=1
JobCredentialPrivateKey=/etc/slurm/slurm.key
JobCredentialPublicCertificate=/etc/slurm/slurm.cert
#JobFileAppend=0
#JobRequeue=1
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
MpiDefault=none
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/pgid
#Prolog=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/tmp/slurmd
SlurmUser=slurm
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/tmp
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TmpFs=/tmp
#TreeWidth=
#UnkillableStepProgram=
#UnkillableStepTimeout=
#UsePAM=0
#
#
# TIMERS
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
MinJobAge=300
KillWait=30
#MessageTimeout=10
SlurmctldTimeout=300
SlurmdTimeout=300
#UnkillableStepProgram=
#UnkillableStepTimeout=60
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/wiki
SchedulerPort=7321
SelectType=select/linear
#SelectTypeParameters=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
ClusterName=cluster
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=9
SlurmctldLogFile=/tmp/slurm/log/slurmctld.log
SlurmdDebug=9
SlurmdLogFile=/tmp/slurm/log/slurmd.log
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=cluster[1-5] Procs=1 State=UNKNOWN
PartitionName=openhpc Nodes=cluster[1-5] Default=YES MaxTime=INFINITE State=UP
11/19 10:52:55 MRMCheckEvents()
11/19 10:52:55 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:55 MSUAcceptClient(6,ClientSD,HostName,TCP)
11/19 10:52:55 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:55 INFO: all clients connected. servicing requests
11/19 10:52:56 ServerProcessRequests()
11/19 10:52:56 MLogRoll(NULL,0,1)
11/19 10:52:56 INFO: not rolling logs (8001909 < 10000000)
11/19 10:52:56 MResAdjust(NULL,0,0)
11/19 10:52:56 MJobSetAttr(,PAL,Value,1,2)
11/19 10:52:56 INFO: job flags for job : 0, req napolicy=SHARED
11/19 10:52:56 MJobSetAttr(,GAttr,Value,1,5)
11/19 10:52:56 MUMAGetBM(JFeature,PREEMPTEE,3)
11/19 10:52:56 INFO: attribute 'PREEMPTEE' cleared for job
11/19 10:52:56 MStatInitializeActiveSysUsage()
11/19 10:52:56 MStatClearUsage([NONE],Active)
11/19 10:52:56 INFO: clearing usage stats for user [ALL]
11/19 10:52:56 INFO: clearing usage stats for user root
11/19 10:52:56 INFO: clearing usage stats for user DEFAULT
11/19 10:52:56 INFO: clearing usage stats for group NOGROUP
11/19 10:52:56 INFO: clearing usage stats for group [ALL]
11/19 10:52:56 INFO: clearing usage stats for group root
11/19 10:52:56 INFO: clearing usage stats for group DEFAULT
11/19 10:52:56 INFO: clearing usage stats for acct QA_ACCT
11/19 10:52:56 INFO: clearing usage stats for acct ALL
11/19 10:52:56 INFO: clearing usage stats for acct MY_ACCT
11/19 10:52:56 INFO: clearing usage stats for acct [ALL]
11/19 10:52:56 INFO: clearing usage stats for acct DEFAULT
11/19 10:52:56 INFO: clearing usage stats for qos DEFAULT
11/19 10:52:56 INFO: clearing usage stats for qos [ALL]
11/19 10:52:56 INFO: clearing usage stats for class [NONE]
11/19 10:52:56 INFO: clearing usage stats for class [ALL]
11/19 10:52:56 INFO: clearing usage stats for par ALL
11/19 10:52:56 INFO: clearing usage stats for par DEFAULT
11/19 10:52:56 INFO: clearing usage stats for par openhpc
11/19 10:52:56 ServerUpdate()
11/19 10:52:56 MSysUpdateTime()
11/19 10:52:56 INFO: starting iteration 1
11/19 10:52:56 MSchedProcessJobs()
11/19 10:52:56 MRMGetInfo()
11/19 10:52:56 MClusterClearUsage()
11/19 10:52:56 MRMClusterQuery()
11/19 10:52:56 MWikiClusterLoadInfo(cluster1,RCount,EMsg,SC)
11/19 10:52:56 MWikiDoCommand(cluster1,7321,9000000,NONE,CMD=GETNODES ARG=0:ALL,Data,DataSize,SC)
11/19 10:52:56 MSUConnect(S,FALSE,EMsg)
11/19 10:52:56 INFO: trying to connect to 10.1.1.10 (Port: 7321)
11/19 10:52:56 INFO: non-blocking mode established
11/19 10:52:56 MSUSelectWrite(7,9000000)
11/19 10:52:56 INFO: successful connect to TCP server (sd: 7)
11/19 10:52:56 MSUSendData(S,9000000,FALSE,FALSE)
11/19 10:52:56 INFO: header created '00000022
'
11/19 10:52:56 INFO: sending short packet '00000022
CMD=GETNODES ARG=0:ALL'
11/19 10:52:56 MSUSendPacket(7,Buf,31,9000000,SC)
11/19 10:52:56 INFO: sending packet '00000022
CMD=GETNODES ARG=0:ALL'
11/19 10:52:56 MSUSelectWrite(7,9000000)
11/19 10:52:56 INFO: packet sent (31 bytes of 31)
11/19 10:52:56 INFO: command sent to server
11/19 10:52:56 INFO: message sent: 'CMD=GETNODES ARG=0:ALL'
11/19 10:52:56 MSURecvData(S,9000000,FALSE,SC,EMsg)
11/19 10:52:56 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
11/19 10:52:56 MSUSelectRead(7,9000000)
11/19 10:52:56 INFO: 9 of 9 bytes read from sd 7
11/19 10:52:56 INFO: message '00000385
' read
11/19 10:52:56 MSURecvPacket(7,BufP,385,NULL,9000000,SC)
11/19 10:52:56 MSUSelectRead(7,9000000)
11/19 10:52:56 INFO: 385 of 385 bytes read from sd 7
11/19 10:52:56 INFO: message 'CK=b3f391fa8bda4e98 TS=1227113576 AUTH=slurm DT=SC=0 ARG=5#cluster1:STATE=Idle;ARCH=ppc64;OS=Linux;CMEMORY=1;CDISK=0;CPROC=1;#cluster2:STATE=Busy;CMEMORY=1;CDISK=0;CPROC=1;#cluster3:STATE=Busy;CMEMORY=1;CDISK=0;CPROC=1;#cluster4:STATE=Busy;CMEMORY=1;CDISK=0;CPROC=1;#cluster5:STATE=Idle;ARCH=ppc64;OS=Linux;CMEMORY=1;CDISK=0;CPROC=1;' read
11/19 10:52:56 INFO: received message 'CK=b3f391fa8bda4e98 TS=1227113576 AUTH=slurm DT=SC=0 ARG=5#cluster1:STATE=Idle;ARCH=ppc64;OS=Linux;CMEMORY=1;CDISK=0;CPROC=1;#cluster2:STATE=Busy;CMEMORY=1;CDISK=0;CPROC=1;#cluster3:STATE=Busy;CMEMORY=1;CDISK=0;CPROC=1;#cluster4:STATE=Busy;CMEMORY=1;CDISK=0;CPROC=1;#cluster5:STATE=Idle;ARCH=ppc64;OS=Linux;CMEMORY=1;CDISK=0;CPROC=1;' from wiki server
11/19 10:52:56 MSUDisconnect(S)
11/19 10:52:56 INFO: received node list through WIKI RM
11/19 10:52:56 INFO: loading 5 node(s)
11/19 10:52:56 MWikiGetAttr(node,Name,Status,Attr,Start)
11/19 10:52:56 MUMAGetIndex(NodeState,Idle,ADD)
11/19 10:52:56 MNodeFind(cluster1,N)
11/19 10:52:56 MRMNodePreUpdate(cluster1,Idle,cluster1)
11/19 10:52:56 MWikiNodeUpdate(AList,cluster1)
11/19 10:52:56 MWikiNodeUpdateAttr(STATE=Idle,cluster1)
11/19 10:52:56 MUMAGetIndex(NodeState,Idle,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(ARCH=ppc64,cluster1)
11/19 10:52:56 MUMAGetIndex(Arch,ppc64,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(OS=Linux,cluster1)
11/19 10:52:56 MUMAGetIndex(Opsys,Linux,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(CMEMORY=1,cluster1)
11/19 10:52:56 MWikiNodeUpdateAttr(CDISK=0,cluster1)
11/19 10:52:56 MWikiNodeUpdateAttr(CPROC=1,cluster1)
11/19 10:52:56 MRMNodePostUpdate(cluster1,Idle)
11/19 10:52:56 MWikiGetAttr(node,Name,Status,Attr,Start)
11/19 10:52:56 MUMAGetIndex(NodeState,Busy,ADD)
11/19 10:52:56 MNodeFind(cluster2,N)
11/19 10:52:56 MRMNodePreUpdate(cluster2,Busy,cluster1)
11/19 10:52:56 MWikiNodeUpdate(AList,cluster2)
11/19 10:52:56 MWikiNodeUpdateAttr(STATE=Busy,cluster2)
11/19 10:52:56 MUMAGetIndex(NodeState,Busy,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(CMEMORY=1,cluster2)
11/19 10:52:56 MWikiNodeUpdateAttr(CDISK=0,cluster2)
11/19 10:52:56 MWikiNodeUpdateAttr(CPROC=1,cluster2)
11/19 10:52:56 MRMNodePostUpdate(cluster2,Busy)
11/19 10:52:56 MWikiGetAttr(node,Name,Status,Attr,Start)
11/19 10:52:56 MUMAGetIndex(NodeState,Busy,ADD)
11/19 10:52:56 MNodeFind(cluster3,N)
11/19 10:52:56 MRMNodePreUpdate(cluster3,Busy,cluster1)
11/19 10:52:56 MWikiNodeUpdate(AList,cluster3)
11/19 10:52:56 MWikiNodeUpdateAttr(STATE=Busy,cluster3)
11/19 10:52:56 MUMAGetIndex(NodeState,Busy,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(CMEMORY=1,cluster3)
11/19 10:52:56 MWikiNodeUpdateAttr(CDISK=0,cluster3)
11/19 10:52:56 MWikiNodeUpdateAttr(CPROC=1,cluster3)
11/19 10:52:56 MRMNodePostUpdate(cluster3,Busy)
11/19 10:52:56 MWikiGetAttr(node,Name,Status,Attr,Start)
11/19 10:52:56 MUMAGetIndex(NodeState,Busy,ADD)
11/19 10:52:56 MNodeFind(cluster4,N)
11/19 10:52:56 MRMNodePreUpdate(cluster4,Busy,cluster1)
11/19 10:52:56 MWikiNodeUpdate(AList,cluster4)
11/19 10:52:56 MWikiNodeUpdateAttr(STATE=Busy,cluster4)
11/19 10:52:56 MUMAGetIndex(NodeState,Busy,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(CMEMORY=1,cluster4)
11/19 10:52:56 MWikiNodeUpdateAttr(CDISK=0,cluster4)
11/19 10:52:56 MWikiNodeUpdateAttr(CPROC=1,cluster4)
11/19 10:52:56 MRMNodePostUpdate(cluster4,Busy)
11/19 10:52:56 MWikiGetAttr(node,Name,Status,Attr,Start)
11/19 10:52:56 MUMAGetIndex(NodeState,Idle,ADD)
11/19 10:52:56 MNodeFind(cluster5,N)
11/19 10:52:56 MRMNodePreUpdate(cluster5,Idle,cluster1)
11/19 10:52:56 MWikiNodeUpdate(AList,cluster5)
11/19 10:52:56 MWikiNodeUpdateAttr(STATE=Idle,cluster5)
11/19 10:52:56 MUMAGetIndex(NodeState,Idle,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(ARCH=ppc64,cluster5)
11/19 10:52:56 MUMAGetIndex(Arch,ppc64,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(OS=Linux,cluster5)
11/19 10:52:56 MUMAGetIndex(Opsys,Linux,ADD)
11/19 10:52:56 MWikiNodeUpdateAttr(CMEMORY=1,cluster5)
11/19 10:52:56 MWikiNodeUpdateAttr(CDISK=0,cluster5)
11/19 10:52:56 MWikiNodeUpdateAttr(CPROC=1,cluster5)
11/19 10:52:56 MRMNodePostUpdate(cluster5,Idle)
11/19 10:52:56 INFO: 5 WIKI resources detected on RM cluster1
11/19 10:52:56 INFO: resources detected: 5
11/19 10:52:56 MRMWorkloadQuery()
11/19 10:52:56 MWikiWorkloadQuery(cluster1,JCount,SC)
11/19 10:52:56 MWikiDoCommand(cluster1,7321,9000000,NONE,CMD=GETJOBS ARG=0:ALL,Data,DataSize,SC)
11/19 10:52:56 MSUConnect(S,FALSE,EMsg)
11/19 10:52:56 INFO: trying to connect to 10.1.1.10 (Port: 7321)
11/19 10:52:56 INFO: non-blocking mode established
11/19 10:52:56 MSUSelectWrite(7,9000000)
11/19 10:52:56 INFO: successful connect to TCP server (sd: 7)
11/19 10:52:56 MSUSendData(S,9000000,FALSE,FALSE)
11/19 10:52:56 INFO: header created '00000021
'
11/19 10:52:56 INFO: sending short packet '00000021
CMD=GETJOBS ARG=0:ALL'
11/19 10:52:56 MSUSendPacket(7,Buf,30,9000000,SC)
11/19 10:52:56 INFO: sending packet '00000021
CMD=GETJOBS ARG=0:ALL'
11/19 10:52:56 MSUSelectWrite(7,9000000)
11/19 10:52:56 INFO: packet sent (30 bytes of 30)
11/19 10:52:56 INFO: command sent to server
11/19 10:52:56 INFO: message sent: 'CMD=GETJOBS ARG=0:ALL'
11/19 10:52:56 MSURecvData(S,9000000,FALSE,SC,EMsg)
11/19 10:52:56 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
11/19 10:52:56 MSUSelectRead(7,9000000)
11/19 10:52:56 INFO: 9 of 9 bytes read from sd 7
11/19 10:52:56 INFO: message '00000310
' read
11/19 10:52:56 MSURecvPacket(7,BufP,310,NULL,9000000,SC)
11/19 10:52:56 MSUSelectRead(7,9000000)
11/19 10:52:56 INFO: 310 of 310 bytes read from sd 7
11/19 10:52:56 INFO: message 'CK=1c984a87ea3c48fc TS=1227113576 AUTH=slurm DT=SC=0 ARG=1#2:STATE=Completed;UPDATETIME=1227108064;WCLIMIT=31536000;TASKS=5;DPROCS=1;QUEUETIME=1227107968;STARTTIME=1227107969;PARTITIONMASK=openhpc;RMEM=0;RDISK=0;COMPLETETIME=1227108069;UNAME=root;GNAME=root;' read
11/19 10:52:56 INFO: received message 'CK=1c984a87ea3c48fc TS=1227113576 AUTH=slurm DT=SC=0 ARG=1#2:STATE=Completed;UPDATETIME=1227108064;WCLIMIT=31536000;TASKS=5;DPROCS=1;QUEUETIME=1227107968;STARTTIME=1227107969;PARTITIONMASK=openhpc;RMEM=0;RDISK=0;COMPLETETIME=1227108069;UNAME=root;GNAME=root;' from wiki server
11/19 10:52:56 MSUDisconnect(S)
11/19 10:52:56 INFO: received job list through WIKI RM
11/19 10:52:56 INFO: loading 1 job(s)
11/19 10:52:56 MWikiGetAttr(job,Name,Status,Attr,Start)
11/19 10:52:56 MUMAGetIndex(JobState,Completed,ADD)
11/19 10:52:56 MJobFind('2',J,0)
11/19 10:52:56 MUGetHash(2)
11/19 10:52:56 INFO: hash '2' --> 107940
11/19 10:52:56 INFO: job '2' hash 1444
11/19 10:52:56 INFO: ignoring job '2' (state: Completed)
11/19 10:52:56 INFO: 1 WIKI jobs detected on RM cluster1
11/19 10:52:56 INFO: jobs detected: 1
11/19 10:52:56 MStatClearUsage(node,Active)
11/19 10:52:56 INFO: clearing usage stats for acct QA_ACCT
11/19 10:52:56 INFO: clearing usage stats for acct ALL
11/19 10:52:56 INFO: clearing usage stats for acct MY_ACCT
11/19 10:52:56 INFO: clearing usage stats for acct [ALL]
11/19 10:52:56 INFO: clearing usage stats for acct DEFAULT
11/19 10:52:56 INFO: clearing usage stats for class [NONE]
11/19 10:52:56 INFO: clearing usage stats for class [ALL]
11/19 10:52:56 MClusterUpdateNodeState()
11/19 10:52:56 INFO: node 'cluster1' C/A/D procs: 1/1/0
11/19 10:52:56 INFO: node 'cluster2' C/A/D procs: 1/0/0
11/19 10:52:56 INFO: node 'cluster3' C/A/D procs: 1/0/0
11/19 10:52:56 INFO: node 'cluster4' C/A/D procs: 1/0/0
11/19 10:52:56 INFO: node 'cluster5' C/A/D procs: 1/1/0
11/19 10:52:56 MParUpdate(ALL)
11/19 10:52:56 INFO: P[ALL]: Total 5:5 Up 5:5 Idle 2:2 Active 0:0
11/19 10:52:56 INFO: MNode[cluster1] added to MPar[openhpc] (1:1)
11/19 10:52:56 INFO: MNode[cluster2] added to MPar[openhpc] (0:1)
11/19 10:52:56 INFO: MNode[cluster3] added to MPar[openhpc] (0:1)
11/19 10:52:56 INFO: MNode[cluster4] added to MPar[openhpc] (0:1)
11/19 10:52:56 INFO: MNode[cluster5] added to MPar[openhpc] (1:1)
11/19 10:52:56 INFO: P[ALL]: Total 5:5 Up 5:5 Idle 2:2 Active 0:0
11/19 10:52:56 INFO: jobs in queue
11/19 10:52:56 MResAdjustDRes(NULL,FALSE)
11/19 10:52:56 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
11/19 10:52:56 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
11/19 10:52:56 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
11/19 10:52:56 INFO: idle job queue is empty on iteration 1
11/19 10:52:56 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
11/19 10:52:56 INFO: idle job queue is empty on iteration 1
11/19 10:52:56 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
11/19 10:52:56 INFO: idle job queue is empty on iteration 1
11/19 10:52:56 INFO: cannot finalize RM cycle (RM 'cluster1' does not support function 'cyclefinalize')
11/19 10:52:56 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
11/19 10:52:56 INFO: idle job queue is empty on iteration 1
11/19 10:52:56 MSchedUpdateStats()
11/19 10:52:56 INFO: iteration: 1 scheduling time: 0.001 seconds
11/19 10:52:56 MResUpdateStats()
11/19 10:52:56 INFO: current util[1]: 3/5 (60.00%) PH: 35.08% active jobs: 0 of 0 (completed: 2641)
11/19 10:52:56 MQueueCheckStatus()
11/19 10:52:56 MNodeCheckStatus()
11/19 10:52:56 INFO: checking node 'cluster1'
11/19 10:52:56 INFO: checking node 'cluster2'
11/19 10:52:56 INFO: checking node 'cluster3'
11/19 10:52:56 INFO: checking node 'cluster4'
11/19 10:52:56 INFO: checking node 'cluster5'
11/19 10:52:56 MSysCheck()
11/19 10:52:56 MLimitEnforceAll(ALL)
11/19 10:52:56 MUClearChild(PID)
11/19 10:52:56 MParUpdate(ALL)
11/19 10:52:56 INFO: P[ALL]: Total 5:5 Up 5:5 Idle 2:2 Active 0:0
11/19 10:52:56 INFO: MNode[cluster1] added to MPar[openhpc] (1:1)
11/19 10:52:56 INFO: MNode[cluster2] added to MPar[openhpc] (0:1)
11/19 10:52:56 INFO: MNode[cluster3] added to MPar[openhpc] (0:1)
11/19 10:52:56 INFO: MNode[cluster4] added to MPar[openhpc] (0:1)
11/19 10:52:56 INFO: MNode[cluster5] added to MPar[openhpc] (1:1)
11/19 10:52:56 INFO: P[ALL]: Total 5:5 Up 5:5 Idle 2:2 Active 0:0
11/19 10:52:56 MResCheckStatus(NULL)
11/19 10:52:56 INFO: scheduling complete. sleeping 5 seconds
11/19 10:52:56 UIProcessClients(5,5)
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:56 MRMCheckEvents()
11/19 10:52:56 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:56 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:56 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:56 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:57 MRMCheckEvents()
11/19 10:52:57 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:57 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:57 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:57 INFO: all clients connected. servicing requests
11/19 10:52:58 MRMCheckEvents()
11/19 10:52:58 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:58 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:58 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:58 INFO: all clients connected. servicing requests
11/19 10:52:58 MRMCheckEvents()
11/19 10:52:58 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:58 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:58 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:58 INFO: all clients connected. servicing requests
11/19 10:52:58 MRMCheckEvents()
11/19 10:52:58 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:58 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:58 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:58 INFO: all clients connected. servicing requests
11/19 10:52:58 MRMCheckEvents()
11/19 10:52:58 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:58 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:58 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:58 INFO: all clients connected. servicing requests
11/19 10:52:58 MRMCheckEvents()
11/19 10:52:58 ALERT: cannot query events on RM (RM 'cluster1' does not support function 'rmeventquery')
11/19 10:52:58 MSUAcceptClient(5,ClientSD,HostName,TCP)
11/19 10:52:58 INFO: accept call failed, errno: 11 (Resource temporarily unavailable)
11/19 10:52:58 INFO: all clients connected. servicing requests
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers