Hi, First off, have you considered options 2) using multi-sim? That would allow you to not have to bother manually set offsets. Can you not submit your jobs such that you fill at least a node?
How many threads/cores does you node have? Can you share log files? Cheers, -- Szilárd On Wed, Sep 13, 2017 at 9:14 PM, gromacs query <[email protected]> wrote: > Hi Szilárd, > > Sorry I was bit quick to say its working with pinoffset. I just submitted > four same jobs (2 gpus, 4 nprocs) on the same node with -pin on and > different -pinoffset to 0, 5, 10, 15 (numbers should be fine as there are > 40 cores on node). Still I don't get same performance (all variably less > than 50%) as expected from a single independent job. Now am wondering if > its still related to overlap of cores as pin on should lock the cores for > the same job. > > -J > > On Wed, Sep 13, 2017 at 7:33 PM, gromacs query <[email protected]> > wrote: > >> Hi Szilárd, >> >> Thanks, option 3 was in my mind but I need to figure out now how :) >> Manually fixing pinoffset as of now seems working with some quick test. >> I think option 1 would require to ask the admin but I can try option 3 >> myself. As there are other users from different places who may not bother >> using option 3. I think I would need to ask the admin to force option 1 but >> before that I will try option 3. >> >> JIom >> >> On Wed, Sep 13, 2017 at 7:10 PM, Szilárd Páll <[email protected]> >> wrote: >> >>> J, >>> >>> You have a few options: >>> >>> * Use SLURM to assign not only the set of GPUs, but also the correct >>> set of CPU cores to each mdrun process. If you do so, mdrun will >>> respect the affinity mask it will inherit and your two mdrun jobs >>> should be running on the right set of cores. This has the drawback >>> that (AFAIK) SLURM/aprun (or srun) will not allow you to bind each >>> application thread to a core/hardware thread (which is what mdrun >>> does), only a process to a group of cores/hw threads which can >>> sometimes lead to performance loss. (You might be able to compensate >>> using some OpenMP library environment variables, though.) >>> >>> * Run multiple jobs with mdrun "-multi"/"-multidir" (either two per >>> node or mulitple across nodes) and benefit from the rank/thread to >>> core/hw thread assignment that's supported also across multiple >>> simulations part of a multi-run; e.g.: >>> mpirun -np 4 gmx mdrun -multi 4 -ntomp N -multidir my_input_dir{1,2,3,4} >>> will launch 4 ranks and start 4 simulations in each of the four >>> directories passed. >>> >>> * Write a wrapper script around gmx mdrun which will be what you >>> launch with SLURM; you can then inspect the node and decide what >>> pinoffset value to pass to your mdrun launch command. >>> >>> >>> I hope one of these will deliver the desired results :) >>> >>> Cheers, >>> -- >>> Szilárd >>> >>> >>> On Wed, Sep 13, 2017 at 7:47 PM, gromacs query <[email protected]> >>> wrote: >>> > Hi Szilárd, >>> > >>> > Thanks for your reply. This is useful but now am thinking because the >>> slurm >>> > launches job in an automated way it is not really in my control to >>> choose >>> > the node. So following things can happen; say for two mdrun jobs I set >>> > -pinoffset 0 and -pinoffset 4; >>> > >>> > - if they are running on the same node this is good >>> > - if jobs run on different nodes (partially occupied or free) whether >>> these >>> > chosen pinoffsets will make sense or not as I don't know what pinoffset >>> I >>> > would need to set >>> > - if I have to submit many jobs together and slurm chooses >>> different/same >>> > node itself then I think it is difficult to define pinoffset. >>> > >>> > - >>> > J >>> > >>> > On Wed, Sep 13, 2017 at 6:14 PM, Szilárd Páll <[email protected]> >>> > wrote: >>> > >>> >> My guess is that the two jobs are using the same cores -- either all >>> >> cores/threads or only half of them, but the same set. >>> >> >>> >> You should use -pinoffset; see: >>> >> >>> >> - Docs and example: >>> >> http://manual.gromacs.org/documentation/2016/user-guide/ >>> >> mdrun-performance.html >>> >> >>> >> - More explanation on the thread pinning behavior on the old website: >>> >> http://www.gromacs.org/Documentation/Acceleration_ >>> >> and_parallelization#Pinning_threads_to_physical_cores >>> >> >>> >> Cheers, >>> >> -- >>> >> Szilárd >>> >> >>> >> >>> >> On Wed, Sep 13, 2017 at 6:35 PM, gromacs query <[email protected] >>> > >>> >> wrote: >>> >> > Sorry forgot to add; we thought the two jobs are using same GPU ids >>> but >>> >> > cuda visible devices show both jobs are using different ids (0,1 and >>> 2,3) >>> >> > >>> >> > - >>> >> > J >>> >> > >>> >> > On Wed, Sep 13, 2017 at 5:33 PM, gromacs query < >>> [email protected]> >>> >> > wrote: >>> >> > >>> >> >> Hi All, >>> >> >> >>> >> >> I have some issues with gromacs performance. There are many nodes >>> and >>> >> each >>> >> >> node has number of gpus and the batch process is controlled by >>> slurm. >>> >> >> Although I get good performance with some settings of number of >>> gpus and >>> >> >> nprocs but when I submit same job twice on the same node then the >>> >> >> performance is reduced drastically. e.g >>> >> >> >>> >> >> For 2 GPUs I get 300 ns per day when there is no other job running >>> on >>> >> the >>> >> >> node. When I submit same job twice on the same node & at the same >>> time, >>> >> I >>> >> >> get only 17 ns/day for both the jobs. I am using this: >>> >> >> >>> >> >> mpirun -np 4 gmx_mpi mdrun -deffnm test -ntomp 2 -maxh 0.12 >>> >> >> >>> >> >> Any suggestions highly appreciated. >>> >> >> >>> >> >> Thanks >>> >> >> >>> >> >> Jiom >>> >> >> >>> >> > -- >>> >> > Gromacs Users mailing list >>> >> > >>> >> > * Please search the archive at http://www.gromacs.org/ >>> >> Support/Mailing_Lists/GMX-Users_List before posting! >>> >> > >>> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >> > >>> >> > * For (un)subscribe requests visit >>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users >>> or >>> >> send a mail to [email protected]. >>> >> -- >>> >> Gromacs Users mailing list >>> >> >>> >> * Please search the archive at http://www.gromacs.org/ >>> >> Support/Mailing_Lists/GMX-Users_List before posting! >>> >> >>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >> >>> >> * For (un)subscribe requests visit >>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> >> send a mail to [email protected]. >>> > -- >>> > Gromacs Users mailing list >>> > >>> > * Please search the archive at http://www.gromacs.org/Support >>> /Mailing_Lists/GMX-Users_List before posting! >>> > >>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> > >>> > * For (un)subscribe requests visit >>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> send a mail to [email protected]. >>> -- >>> Gromacs Users mailing list >>> >>> * Please search the archive at http://www.gromacs.org/Support >>> /Mailing_Lists/GMX-Users_List before posting! >>> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >>> * For (un)subscribe requests visit >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> send a mail to [email protected]. >>> >> >> > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to [email protected]. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to [email protected].
