I have one cluster, 256 nodes but the hardware is split up between the nodes, 128 nodes each. Part A and Part B have different number of processors, processor speed, memory... It is set up to specify which half of the cluster to run on in the pbs submit script based on the attributes set in server_priv/nodes file.
I'm having a little trouble with the NODEALLOCATION policy, I think. When I set it to MINRESOURCE, Part A of the cluster can be run on but this half has 4 cpus, so when the user tries to submit say 32 node jobs, the same nodes are being run on and all 4 cpus are used which is not what I want. Set up this way, part B of the cluster is able to have jobs run on it immediately and complete fine. So I changed the NODEALLOCATION POLICY to CPULOAD. Part of A of the cluster behaves how I want it. If the user submits two 32 node jobs with 2 processors, then the jobs are split up but I have a problem running on part B of the cluster. when I try to submit a job to this side of the cluster, the job stays in the QUEUED state from qstat. The maul log will say something like 256 feasible tasks found (running a 128 node ppn=2 job) but the next line will say something like "inadequate tasks found for job whatever 2< 256" However, if I qrun the job number, the job will run fine and complete successfully. I guess I have a couple of questions. The problem I am seeing with the job staying in the queue. Why does it stay in the queue because the NODEALLOCATION policy is set to CPULOAD. There is only one job running on this half of the cluster. Should I set the NODEALLOCATION policy to another setting for this to work properly? I will put below my maui.cfg so if anyone sees anything that would be causing this type of behavior, please let me know. I have not made many changes to the maui.cfg. It is pretty much just a basic setup. Any info would be appreciated. Thanks. # maui.cfg 3.2.6p18 SERVERHOST marvin # primary admin must be first in list ADMIN1 root brad jbennett cpd # Resource Manager Definition RMCFG[marvin] TYPE=PBS RMCFG[marvin] TIMEOUT=30 JOBAGGREGATIONTIME 00:00:10 # Allocation Manager Definition AMCFG[bank] TYPE=NONE # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html # use the 'schedctl -l' command to display current configuration RMPOLLINTERVAL 00:01:00 SERVERPORT 42559 SERVERMODE NORMAL # Admin: http://supercluster.org/mauidocs/a.esecurity.html LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGLEVEL 3 # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html QUEUETIMEWEIGHT 1 # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html #FSPOLICY PSDEDICATED #FSDEPTH 7 #FSINTERVAL 86400 #FSDECAY 0.80 # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html # NONE SPECIFIED # Backfill: http://supercluster.org/mauidocs/8.2backfill.html BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html NODEALLOCATIONPOLICY CPULOAD # QOS: http://supercluster.org/mauidocs/7.3qos.html # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE # Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html # SRSTARTTIME[test] 8:00:00 # SRENDTIME[test] 17:00:00 # SRDAYS[test] MON TUE WED THU FRI # SRTASKCOUNT[test] 20 # SRMAXTIME[test] 0:30:00 # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html # USERCFG[DEFAULT] FSTARGET=25.0 # USERCFG[john] PRIORITY=100 FSTARGET=10.0- # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi # CLASSCFG[batch] FLAGS=PREEMPTEE # CLASSCFG[interactive] FLAGS=PREEMPTOR ## Additions made to the maui config file upon build. Brad NODEAVAILABILITYPOLICY DEDICATED:SWAP JOBNODEMATCHPOLICY EXACTNODE NODEACCESSPOLICY SHARED NODEMAXLOAD 3.5 DEFERTIME 0 0 LOGDIR /var/spool/maui/log LOGFILEROLLDEPTH 10 STATDIR /var/spool/maui/stats ###test to help with running on otis nodes: ENABLEMULTIREQJOBS TRUE -- Brad Mecklenburg _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
