Dear list, I'm having a strange problem when running mpi programs through the maui/torque scheduler: as long as I set ppn to anything larger than 1, the PBS_NODEFILE provides the correct hosts, i.e. nodes=4:ppn=2 works exactly as it should - the file contains 4 hosts, each name printed twice; however with nodes=4:ppn=1 it lists only two hosts with 2 cores per host. A quick grep on the logs has showed that a modification takes place:
--snip-- 02/19 13:29:52 MPBSJobModify(39775,Resource_List,Resource,compute-1-1.local:ppn=2+compute-1-2.local:ppn=2) 02/19 13:29:52 MPBSJobModify(39775,Resource_List,Resource,4:ppn=1) --snap-- Any idea what is going on here? As to our setup - we have 16 nodes with 8 cpus each, here's an excerpt from our maui.cfg: RMPOLLINTERVAL 00:00:30 SERVERMODE NORMAL RMCFG[base] TYPE=PBS LOGFILE maui.log LOGFILEMAXSIZE 10000000 LOGLEVEL 3 QUEUETIMEWEIGHT 1 FSPOLICY DEDICATEDPS FSDEPTH 7 FSINTERVAL 86400 FSDECAY 0.80 BACKFILLPOLICY FIRSTFIT RESERVATIONPOLICY CURRENTHIGHEST NODEALLOCATIONPOLICY MINRESOURCE USERCFG[DEFAULT] FSTARGET=20.0+ FSWEIGHT 10 FSUSERWEIGHT 100 ENFORCERESOURCELIMITS ON RESOURCELIMITPOLICY[0] MEM:ALWAYS:CANCEL SRCFG[small] TASKCOUNT=1 RESOURCES=PROCS:4,MEM:16384 SRCFG[small] HOSTLIST=cluster1.local SRCFG[small] PERIOD=INFINITY SRCFG[small] TIMELIMIT=1:00:00 SRCFG[small] CLASSLIST=small Regards, Lech Nieroda _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
