Hi all,
I am new to installing Torque PBS and Maui. My system is a one dual-processor dual-core server for testing purposes, where I try things out before getting the actual cluster. I have installed both Torque PBS and this seems to work fine. Then I installed Maui and used the file maui.cfg as below, aside from telling that the queue system is PBS I did not change anything.
Now the behavior is that I can start the 'maui' demon, issue 'showq' and see the queue, but when I submit a job, the maui demon seems to stop by itself. Then, when I issue "showq" I get
[EMAIL PROTECTED] 1proc]$ showq
ERROR: cannot send request to server localhost.localdomain:42559 (server may not be running)
ERROR: cannot request service (status)
I have appended the lines generated in maui.log below.
The job runs fine and I can also submit several jobs, which are just done in the order submitted. I can also restart maui and repeat this procedure.
Does anybody have an idea where I should be looking to figure out what is wrong? I would be grateful on any hints on how to get started.
Best, Berit
I am new to installing Torque PBS and Maui. My system is a one dual-processor dual-core server for testing purposes, where I try things out before getting the actual cluster. I have installed both Torque PBS and this seems to work fine. Then I installed Maui and used the file maui.cfg as below, aside from telling that the queue system is PBS I did not change anything.
Now the behavior is that I can start the 'maui' demon, issue 'showq' and see the queue, but when I submit a job, the maui demon seems to stop by itself. Then, when I issue "showq" I get
[EMAIL PROTECTED] 1proc]$ showq
ERROR: cannot send request to server localhost.localdomain:42559 (server may not be running)
ERROR: cannot request service (status)
I have appended the lines generated in maui.log below.
The job runs fine and I can also submit several jobs, which are just done in the order submitted. I can also restart maui and repeat this procedure.
Does anybody have an idea where I should be looking to figure out what is wrong? I would be grateful on any hints on how to get started.
Best, Berit
--------------------------------------
Berit Hinnemann
Research Scientist
Haldor Topsøe A/S
---------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------
output from maui.log upon submitting a job
12/13 16:23:35 INFO: scheduling complete. sleeping 30 seconds
12/13 16:24:06 ServerProcessRequests()
12/13 16:24:06 INFO: not rolling logs (585245 < 10000000)
12/13 16:24:06 MResAdjust(NULL,0,0)
12/13 16:24:06 MStatInitializeActiveSysUsage()
12/13 16:24:06 MStatClearUsage([NONE],Active)
12/13 16:24:06 ServerUpdate()
12/13 16:24:06 MSysUpdateTime()
12/13 16:24:06 INFO: starting iteration 7
12/13 16:24:06 MRMGetInfo()
12/13 16:24:06 MClusterClearUsage()
12/13 16:24:06 MRMClusterQuery()
12/13 16:24:06 MPBSClusterQuery(localhost.localdomain,RCount,SC)
12/13 16:24:06 __MPBSGetNodeState(Name,State,PNode)
12/13 16:24:06 INFO: PBS node localhost.localdomain set to state Busy (job-exclusive)
12/13 16:24:06 INFO: node 'localhost.localdomain' changed states from Idle to Busy
12/13 16:24:06 ALERT: unexpected node transition on node 'localhost.localdomain' Idle -> Busy
12/13 16:24:06 MPBSNodeUpdate(localhost.localdomain,localhost.localdomain,Busy,localhost.localdomain)
12/13 16:24:06 INFO: node localhost.localdomain has joblist '0/10.localhost.localdomain, 1/10.localhost.localdomain, 2/10.localhost.localdomain, 3/10.localhost.localdomain'
12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' (running on node localhost.localdomain)
12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' (running on node localhost.localdomain)
12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' (running on node localhost.localdomain)
12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' (running on node localhost.localdomain)
12/13 16:24:06 MPBSLoadQueueInfo(localhost.localdomain,localhost.localdomain,SC)
12/13 16:24:06 INFO: queue 'batch' started state set to True
12/13 16:24:06 INFO: class to node not mapping enabled for queue 'batch' adding class to all nodes
12/13 16:24:06 INFO: 1 PBS resources detected on RM localhost.localdomain
12/13 16:24:06 INFO: resources detected: 1
12/13 16:24:06 MRMWorkloadQuery()
12/13 16:24:06 MPBSWorkloadQuery(localhost.localdomain,JCount,SC)
12/13 16:24:06 MPBSJobLoad(10,10.localhost.localdomain,J,TaskList,0)
12/13 16:24:06 MReqCreate(10,SrcRQ,DstRQ,DoCreate)
12/13 16:24:06 INFO: processing node request line '1:ppn=4'
12/13 16:24:06 MJobSetCreds(10,behi,behi,)
12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/13 16:24:06 MResJCreate(10,MNodeList,-00:00:10,ActiveJob,Res)
12/13 16:24:06 MStatUpdateActiveJobUsage(10)
---------------------------------------------------------------------------------------------------------------------------------------
maui.cfg
# maui.cfg 3.2.6p18
SERVERHOST localhost.localdomain
# primary admin must be first in list
ADMIN1 root
# Resource Manager Definition
RMCFG[localhost.localdomain] TYPE=PBS
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
QUEUETIMEWEIGHT 1
# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
#FSPOLICY PSDEDICATED
#FSDEPTH 7
#FSINTERVAL 86400
#FSDECAY 0.80
# Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
NODEALLOCATIONPOLICY MINRESOURCE
# QOS: http://supercluster.org/mauidocs/7.3qos.html
# QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
# Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html
# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test] 17:00:00
# SRDAYS[test] MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test] 0:30:00
# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
# USERCFG[DEFAULT] FSTARGET=25.0
# USERCFG[john] PRIORITY=100 FSTARGET=10.0-
# GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch] FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR
Berit Hinnemann
Research Scientist
Haldor Topsøe A/S
---------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------
output from maui.log upon submitting a job
12/13 16:23:35 INFO: scheduling complete. sleeping 30 seconds
12/13 16:24:06 ServerProcessRequests()
12/13 16:24:06 INFO: not rolling logs (585245 < 10000000)
12/13 16:24:06 MResAdjust(NULL,0,0)
12/13 16:24:06 MStatInitializeActiveSysUsage()
12/13 16:24:06 MStatClearUsage([NONE],Active)
12/13 16:24:06 ServerUpdate()
12/13 16:24:06 MSysUpdateTime()
12/13 16:24:06 INFO: starting iteration 7
12/13 16:24:06 MRMGetInfo()
12/13 16:24:06 MClusterClearUsage()
12/13 16:24:06 MRMClusterQuery()
12/13 16:24:06 MPBSClusterQuery(localhost.localdomain,RCount,SC)
12/13 16:24:06 __MPBSGetNodeState(Name,State,PNode)
12/13 16:24:06 INFO: PBS node localhost.localdomain set to state Busy (job-exclusive)
12/13 16:24:06 INFO: node 'localhost.localdomain' changed states from Idle to Busy
12/13 16:24:06 ALERT: unexpected node transition on node 'localhost.localdomain' Idle -> Busy
12/13 16:24:06 MPBSNodeUpdate(localhost.localdomain,localhost.localdomain,Busy,localhost.localdomain)
12/13 16:24:06 INFO: node localhost.localdomain has joblist '0/10.localhost.localdomain, 1/10.localhost.localdomain, 2/10.localhost.localdomain, 3/10.localhost.localdomain'
12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' (running on node localhost.localdomain)
12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' (running on node localhost.localdomain)
12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' (running on node localhost.localdomain)
12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' (running on node localhost.localdomain)
12/13 16:24:06 MPBSLoadQueueInfo(localhost.localdomain,localhost.localdomain,SC)
12/13 16:24:06 INFO: queue 'batch' started state set to True
12/13 16:24:06 INFO: class to node not mapping enabled for queue 'batch' adding class to all nodes
12/13 16:24:06 INFO: 1 PBS resources detected on RM localhost.localdomain
12/13 16:24:06 INFO: resources detected: 1
12/13 16:24:06 MRMWorkloadQuery()
12/13 16:24:06 MPBSWorkloadQuery(localhost.localdomain,JCount,SC)
12/13 16:24:06 MPBSJobLoad(10,10.localhost.localdomain,J,TaskList,0)
12/13 16:24:06 MReqCreate(10,SrcRQ,DstRQ,DoCreate)
12/13 16:24:06 INFO: processing node request line '1:ppn=4'
12/13 16:24:06 MJobSetCreds(10,behi,behi,)
12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/13 16:24:06 MResJCreate(10,MNodeList,-00:00:10,ActiveJob,Res)
12/13 16:24:06 MStatUpdateActiveJobUsage(10)
---------------------------------------------------------------------------------------------------------------------------------------
maui.cfg
# maui.cfg 3.2.6p18
SERVERHOST localhost.localdomain
# primary admin must be first in list
ADMIN1 root
# Resource Manager Definition
RMCFG[localhost.localdomain] TYPE=PBS
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
QUEUETIMEWEIGHT 1
# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
#FSPOLICY PSDEDICATED
#FSDEPTH 7
#FSINTERVAL 86400
#FSDECAY 0.80
# Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
NODEALLOCATIONPOLICY MINRESOURCE
# QOS: http://supercluster.org/mauidocs/7.3qos.html
# QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
# Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html
# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test] 17:00:00
# SRDAYS[test] MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test] 0:30:00
# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
# USERCFG[DEFAULT] FSTARGET=25.0
# USERCFG[john] PRIORITY=100 FSTARGET=10.0-
# GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch] FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR
_______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
