Berit, Try running Maui under a debugger like gdb to see why Maui is shutting down. From your description, I would guess Maui is crashing or experiencing a seg fault. Instructions on how run Maui under gdb can can be found at http://www.clusterresources.com/products/maui/docs/14.6troubleshootingsystemerrors.shtml, Section 14.6.2.1.
Please send the stack trace to the list so we can get this fixed. Thanks, -- Joshua Butikofer Cluster Resources, Inc. [EMAIL PROTECTED] Voice: (801) 717-3707 Fax: (801) 717-3738 -------------------------- Berit Hinnemann wrote: > Hi all, > > I am new to installing Torque PBS and Maui. My system is a one dual-processor > dual-core server for testing purposes, where I try things out before getting > the > actual cluster. I have installed both Torque PBS and this seems to work fine. > Then I installed Maui and used the file maui.cfg as below, aside from telling > that the queue system is PBS I did not change anything. > > Now the behavior is that I can start the 'maui' demon, issue 'showq' and see > the > queue, but when I submit a job, the maui demon seems to stop by itself. Then, > when I issue "showq" I get > > [EMAIL PROTECTED] 1proc]$ showq > ERROR: cannot send request to server localhost.localdomain:42559 (server > may > not be running) > ERROR: cannot request service (status) > > I have appended the lines generated in maui.log below. > The job runs fine and I can also submit several jobs, which are just done in > the > order submitted. I can also restart maui and repeat this procedure. > > Does anybody have an idea where I should be looking to figure out what is > wrong? > I would be grateful on any hints on how to get started. > Best, Berit > > -------------------------------------- > Berit Hinnemann > Research Scientist > Haldor Topsøe A/S > --------------------------------------- > ------------------------------------------------------------------------------------------------------------------------------------- > output from maui.log upon submitting a job > 12/13 16:23:35 INFO: scheduling complete. sleeping 30 seconds > 12/13 16:24:06 ServerProcessRequests() > 12/13 16:24:06 INFO: not rolling logs (585245 < 10000000) > 12/13 16:24:06 MResAdjust(NULL,0,0) > 12/13 16:24:06 MStatInitializeActiveSysUsage() > 12/13 16:24:06 MStatClearUsage([NONE],Active) > 12/13 16:24:06 ServerUpdate() > 12/13 16:24:06 MSysUpdateTime() > 12/13 16:24:06 INFO: starting iteration 7 > 12/13 16:24:06 MRMGetInfo() > 12/13 16:24:06 MClusterClearUsage() > 12/13 16:24:06 MRMClusterQuery() > 12/13 16:24:06 MPBSClusterQuery(localhost.localdomain,RCount,SC) > 12/13 16:24:06 __MPBSGetNodeState(Name,State,PNode) > 12/13 16:24:06 INFO: PBS node localhost.localdomain set to state Busy > (job-exclusive) > 12/13 16:24:06 INFO: node 'localhost.localdomain' changed states from > Idle > to Busy > 12/13 16:24:06 ALERT: unexpected node transition on node > 'localhost.localdomain' Idle -> Busy > 12/13 16:24:06 > MPBSNodeUpdate(localhost.localdomain,localhost.localdomain,Busy,localhost.localdomain) > 12/13 16:24:06 INFO: node localhost.localdomain has joblist > '0/10.localhost.localdomain, 1/10.localhost.localdomain, > 2/10.localhost.localdomain, 3/10.localhost.localdomain' > 12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' > (running on node localhost.localdomain) > 12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' > (running on node localhost.localdomain) > 12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' > (running on node localhost.localdomain) > 12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain' > (running on node localhost.localdomain) > 12/13 16:24:06 > MPBSLoadQueueInfo(localhost.localdomain,localhost.localdomain,SC) > 12/13 16:24:06 INFO: queue 'batch' started state set to True > 12/13 16:24:06 INFO: class to node not mapping enabled for queue 'batch' > adding class to all nodes > 12/13 16:24:06 INFO: 1 PBS resources detected on RM localhost.localdomain > 12/13 16:24:06 INFO: resources detected: 1 > 12/13 16:24:06 MRMWorkloadQuery() > 12/13 16:24:06 MPBSWorkloadQuery(localhost.localdomain,JCount,SC) > 12/13 16:24:06 MPBSJobLoad(10,10.localhost.localdomain,J,TaskList,0) > 12/13 16:24:06 MReqCreate(10,SrcRQ,DstRQ,DoCreate) > 12/13 16:24:06 INFO: processing node request line '1:ppn=4' > 12/13 16:24:06 MJobSetCreds(10,behi,behi,) > 12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) > (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE]) > 12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) > (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE]) > 12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0) > (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE]) > 12/13 16:24:06 MResJCreate(10,MNodeList,-00:00:10,ActiveJob,Res) > 12/13 16:24:06 MStatUpdateActiveJobUsage(10) > --------------------------------------------------------------------------------------------------------------------------------------- > maui.cfg > # maui.cfg 3.2.6p18 > > SERVERHOST localhost.localdomain > # primary admin must be first in list > ADMIN1 root > > # Resource Manager Definition > > RMCFG[localhost.localdomain] TYPE=PBS > > # Allocation Manager Definition > > AMCFG[bank] TYPE=NONE > > # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html > # use the 'schedctl -l' command to display current configuration > > RMPOLLINTERVAL 00:00:30 > > SERVERPORT 42559 > SERVERMODE NORMAL > > # Admin: http://supercluster.org/mauidocs/a.esecurity.html > > > LOGFILE maui.log > LOGFILEMAXSIZE 10000000 > LOGLEVEL 3 > > # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html > > QUEUETIMEWEIGHT 1 > > # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html > > #FSPOLICY PSDEDICATED > #FSDEPTH 7 > #FSINTERVAL 86400 > #FSDECAY 0.80 > > # Throttling Policies: > http://supercluster.org/mauidocs/6.2throttlingpolicies.html > > # NONE SPECIFIED > > # Backfill: http://supercluster.org/mauidocs/8.2backfill.html > > BACKFILLPOLICY FIRSTFIT > RESERVATIONPOLICY CURRENTHIGHEST > > # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html > > NODEALLOCATIONPOLICY MINRESOURCE > > # QOS: http://supercluster.org/mauidocs/7.3qos.html > > # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB > # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE > > # Standing Reservations: > http://supercluster.org/mauidocs/7.1.3standingreservations.html > > # SRSTARTTIME[test] 8:00:00 > # SRENDTIME[test] 17:00:00 > # SRDAYS[test] MON TUE WED THU FRI > # SRTASKCOUNT[test] 20 > # SRMAXTIME[test] 0:30:00 > > # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html > > # USERCFG[DEFAULT] FSTARGET=25.0 > # USERCFG[john] PRIORITY=100 FSTARGET=10.0- > # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi > # CLASSCFG[batch] FLAGS=PREEMPTEE > # CLASSCFG[interactive] FLAGS=PREEMPTOR > > > > ------------------------------------------------------------------------ > > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
