Hi everybody, thanks for all answers. I try all that you point out: including #PBS -l nodes=1:ppn=12
adding JOBNODEMATCHPOLICY EXACTNODE to maui.cfg but nothing of this work. I´m thinking that the problem is in another config parameter (maui or torque). I will reading more about all. Thanks!! ---------------------------------------------------- Ing. Fernando Caba Director General de Telecomunicaciones Universidad Nacional del Sur http://www.dgt.uns.edu.ar Tel/Fax: (54)-291-4595166 Tel: (54)-291-4595101 int. 2050 Avda. Alem 1253, (B8000CPB) Bahía Blanca - Argentina ---------------------------------------------------- El 28/09/2011 12:33 PM, Gus Correa escribió: > Hi Fernando > > Dennis already pointed out the first/main problem. > Your Torque/PBS script is not requesting a specific number of nodes > and cores/processors. > You can ask for 12 processors, even if your MPI command doesn't > use all of them: > > #PBS -l nodes=1:ppn=12 > > [You can still do mpirun -np 8 if you want.] > > This will prevent two jobs to run in the same node [which seems > to be your goal, if I understood it right]. > > I like to add also the queue name [even if it is the default] > and the job name [for documentation and stdout/stderr > naming consistency] > > #PBS -q myqueue [whatever you called your queue] > #PBS -N myjob [15 characters at most, the rest gets truncated] > > The #PBS clauses must be together and right after the #! /bin/sh line. > > Ask your users to always add these lines to their jobs. > There is a feature of torque that allows you to write a wrapper > that will whatever you want to the job script, > but if your pool of users is small > you can just ask them to cooperate. > > Of course there is much more that you can add. > 'man qsub' and 'man pbs_resources' are good sources of information, > highly recommended reading. > > > Then there is what Antonio Messina mentioned, the cpuset feature > of Torque. > I don't know if you installed Torque with this feature enabled. > However, if you did, it will allow the specific cores to be > assigned to each process, which could allow node-sharing without > jobs stepping on each other toes. > However: > A) this requires a bit more of setup [not a lot, check the > list archives and the Torque Admin Guide] > B) if your users are cooperative and request 12 processors for each job, > and you're using the Maui 'JOBNODEMATCHPOLICY EXACTNODE' each job will > get to a single node anyway. > > BTW, did you restart Maui after you added 'JOBNODEMATCHPOLICY EXACTNODE' > to the maui.cfg file? > > I hope this helps, > Gus Correa > > > Fernando Caba wrote: >> Hi Gus, my node file /var/spool/torque /server_priv/nodes looks like: >> >> [root@fe server_priv]# more nodes >> n10 np=12 >> n11 np=12 >> n12 np=12 >> n13 np=12 >> [root@fe server_priv]# >> >> it is exact as your comment. >> >> My script: >> >> #!/bin/bash >> >> cd $PBS_O_WORKDIR >> >> mpirun -np 8 /usr/local/vasp/vasp >> >> launch 8 vasp in one node. If i start one job more (with -np 8), >> the job will run in the same node (n13). >> So if i start another job with -np 8 >> (or -np 4), it will run in the same node n13. >> >> I configured JOBNODEMATCHPOLICY EXACTNODE in maui.cfg, >> but unfortunately the ran in node n13. >> This is an example of the output of top >> >> top - 00:05:53 up 14 days, 6:47, 1 user, load average: 4.18, 4.06, 4.09 >> Mem: 15955108k total, 13287888k used, 2667220k free, 142168k buffers >> Swap: 67111528k total, 16672k used, 67094856k free, 11360332k cached >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 21796 patricia 25 0 463m 291m 12m R 100.5 1.9 517:29.59 vasp >> 21797 patricia 25 0 448m 276m 11m R 100.2 1.8 518:51.49 vasp >> 21798 patricia 25 0 458m 287m 11m R 100.2 1.8 522:01.79 vasp >> 21799 patricia 25 0 448m 276m 11m R 99.9 1.8 519:04.25 vasp >> 1 root 15 0 10348 672 568 S 0.0 0.0 0:00.53 init >> 2 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/0 >> 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 >> 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 >> 5 root RT -5 0 0 0 S 0.0 0.0 0:00.04 migration/1 >> >> The job that generate those 4 vasp process is: >> >> #!/bin/bash >> >> cd $PBS_O_WORKDIR >> >> mpirun -np 4 /usr/local/vasp/vasp >> >> Thanks >> >> ---------------------------------------------------- >> Ing. Fernando Caba >> Director General de Telecomunicaciones >> Universidad Nacional del Sur >> http://www.dgt.uns.edu.ar >> Tel/Fax: (54)-291-4595166 >> Tel: (54)-291-4595101 int. 2050 >> Avda. Alem 1253, (B8000CPB) Bahía Blanca - Argentina >> ---------------------------------------------------- >> >> >> El 27/09/2011 08:07 PM, Gus Correa escribió: >>> Hi Fernando >>> >>> Did you try something like this in your >>> ${TORQUE}/server_priv/nodes file? >>> >>> frontend np=12 [skip this line if the frontend is not to do job work] >>> node1 np=12 >>> node2 np=12 >>> node3 np=12 >>> node4 np=12 >>> >>> This is probably the first thing to do. >>> It is not Maui, just plain Torque [actually pbs_server configuration]. >>> >>> The lines above assume your nodes are called node1, ... >>> and the head node is called frontend, >>> in some name-resolvable manner [most likely >>> in your /etc/hosts file, most likely pointing to the nodes' >>> IP addresses in your cluster's private subnet, 192.168.X.X, >>> 10.X.X.X or equivalent]. >>> >>> The 'np=12' clause will allow at most 12 *processes* per node. >>> >>> >>> [However, if VASP is *threaded*, say via OpenMP, then it won't >>> prevent that several threads are launched from each process. >>> To handle threaded you can use some tricks, such as requesting >>> more cores than processes. >>> Sorry, I am not familiar to VASP to be able to say more than this.] >>> >>> I would suggest that you take a look at the Torque Admin Manual >>> for more details: >>> http://www.adaptivecomputing.com/resources/docs/torque/ >>> >>> There are further controls in Maui, such as >>> 'JOBNODEMATCHPOLICY EXACTNODE' in maui.cfg, >>> for instance, if you want full nodes allocated to each job, >>> as opposed to jobs sharing cores in a single node. >>> However, these choices may come later. >>> [You can change maui.cfg and restart the maui scheduler to >>> test various changes.] >>> >>> For Maui details see the Maui Admin Guide: >>> http://www.adaptivecomputing.com/resources/docs/maui/index.php >>> >>> I hope this helps, >>> Gus Correa >>> >>> Fernando Caba wrote: >>>> Hi every body, i am using torque 3.0.1 and maui 3.3.1 in a configuration >>>> composed by a front end and 4 nodes (2 processors, 6 cores each) >>>> totalizing 48 cores. >>>> I need to configure that in each node don´t run no more than 12 process >>>> (particular we are using vasp), so we wan´t no more than 12 vasp process >>>> by node. >>>> How can i configure this? I´m so confusing reading a lot of information >>>> from torque and maui configuration. >>>> >>>> Thank´s in advance. >>>> >>> _______________________________________________ >>> mauiusers mailing list >>> [email protected] >>> http://www.supercluster.org/mailman/listinfo/mauiusers >>> >> _______________________________________________ >> mauiusers mailing list >> [email protected] >> http://www.supercluster.org/mailman/listinfo/mauiusers > _______________________________________________ > mauiusers mailing list > [email protected] > http://www.supercluster.org/mailman/listinfo/mauiusers > _______________________________________________ mauiusers mailing list [email protected] http://www.supercluster.org/mailman/listinfo/mauiusers
