On Tue, Jul 30, 2019 at 3:29 PM Carlos Navarro <carlos.navarr...@gmail.com> wrote: > > Hi all, > First of all, thanks for all your valuable inputs!!. > I tried Szilárd suggestion (multi simulations) with the following commands > (using a single node): > > EXE="mpirun -np 4 gmx_mpi mdrun " > > cd $WORKDIR0 > #$DO_PARALLEL > $EXE -s 4q.tpr -deffnm 4q -dlb yes -resethway -multidir 1 2 3 4 > And I noticed that the performance went from 37,32,23,22 ns/day to ~42 > ns/day in all four simulations. I check that the 80 processors were been > used a 100% of the time, while the gpu was used about a 50% (from a 70% > when running a single simulation in the node where I obtain a performance > of ~50 ns/day).
Great! Note that optimizing hardware utilization doesn't always maximize performance. Also, manual launches with pinoffset/pinstride will give exactly the same performance as the multi runs *if* you get the affinities right. In your original commands you tried to use 20 of the 80 threads/rank, but you offset the runs only by 10 (hardware threads) which means that runs were overlapping and interfering with each other as well as ending up under-utilizing the hardware. > So overall I'm quite happy with the performance I'm getting now; and > honestly, I don't know if at some point I can get the same performance > (running 4 jobs) that I'm getting running just one. No, but you _may_ get a bit more aggregate performance if you run 8 concurrent jobs. Also, you cna try 1 thread per core ("mpirun -np 4 gmx mdrun_mpi -multi 4 -ntomp 10 -pin on to use only half of the threads), Cheers, -- Szilárd > Best regards, > Carlos > > —————— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > On July 29, 2019 at 6:11:31 PM, Mark Abraham (mark.j.abra...@gmail.com) > wrote: > > Hi, > > Yes and the -nmpi I copied from Carlos's post is ineffective - use -ntmpi > > Mark > > > On Mon., 29 Jul. 2019, 15:15 Justin Lemkul, <jalem...@vt.edu> wrote: > > > > > > > On 7/29/19 8:46 AM, Carlos Navarro wrote: > > > Hi Mark, > > > I tried that before, but unfortunately in that case (removing > —gres=gpu:1 > > > and including in each line the -gpu_id flag) for some reason the jobs > are > > > run one at a time (one after the other), so I can’t use properly the > > whole > > > node. > > > > > > > You need to run all but the last mdrun process in the background (&). > > > > -Justin > > > > > —————— > > > Carlos Navarro Retamal > > > Bioinformatic Engineering. PhD. > > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > > Simulations > > > Universidad de Talca > > > Av. Lircay S/N, Talca, Chile > > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > > > On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com) > > > wrote: > > > > > > Hi, > > > > > > When you use > > > > > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > > > > > then the environment seems to make sure only one GPU is visible. (The > log > > > files report only finding one GPU.) But it's probably the same GPU in > > each > > > case, with three remaining idle. I would suggest not using --gres unless > > > you can specify *which* of the four available GPUs each run can use. > > > > > > Otherwise, don't use --gres and use the facilities built into GROMACS, > > e.g. > > > > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > > > -ntomp 20 -gpu_id 0 > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 10 > > > -ntomp 20 -gpu_id 1 > > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 20 > > > -ntomp 20 -gpu_id 2 > > > etc. > > > > > > Mark > > > > > > On Mon, 29 Jul 2019 at 11:34, Carlos Navarro <carlos.navarr...@gmail.com > > > > > > wrote: > > > > > >> Hi Szilárd, > > >> To answer your questions: > > >> **are you trying to run multiple simulations concurrently on the same > > >> node or are you trying to strong-scale? > > >> I'm trying to run multiple simulations on the same node at the same > > time. > > >> > > >> ** what are you simulating? > > >> Regular and CompEl simulations > > >> > > >> ** can you provide log files of the runs? > > >> In the following link are some logs files: > > >> https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. > > >> In short, alone.log -> single run in the node (using 1 gpu). > > >> multi1/2/3/4.log ->4 independent simulations ran at the same time in a > > >> single node. In all cases, 20 cpus are used. > > >> Best regards, > > >> Carlos > > >> > > >> El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (< > > pall.szil...@gmail.com>) > > >> escribió: > > >> > > >>> Hi, > > >>> > > >>> It is not clear to me how are you trying to set up your runs, so > > >>> please provide some details: > > >>> - are you trying to run multiple simulations concurrently on the same > > >>> node or are you trying to strong-scale? > > >>> - what are you simulating? > > >>> - can you provide log files of the runs? > > >>> > > >>> Cheers, > > >>> > > >>> -- > > >>> Szilárd > > >>> > > >>> On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro > > >>> <carlos.navarr...@gmail.com> wrote: > > >>>> No one can give me an idea of what can be happening? Or how I can > > > solve > > >>> it? > > >>>> Best regards, > > >>>> Carlos > > >>>> > > >>>> —————— > > >>>> Carlos Navarro Retamal > > >>>> Bioinformatic Engineering. PhD. > > >>>> Postdoctoral Researcher in Center of Bioinformatics and Molecular > > >>>> Simulations > > >>>> Universidad de Talca > > >>>> Av. Lircay S/N, Talca, Chile > > >>>> E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > >>>> > > >>>> On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( > > >>> carlos.navarr...@gmail.com) > > >>>> wrote: > > >>>> > > >>>> Dear gmx-users, > > >>>> I’m currently working in a server where each node posses 40 physical > > >>> cores > > >>>> (40 threads) and 4 Nvidia-V100. > > >>>> When I launch a single job (1 simulation using a single gpu card) I > > >> get a > > >>>> performance of about ~35ns/day in a system of about 300k atoms. > > > Looking > > >>>> into the usage of the video card during the simulation I notice that > > >> the > > >>>> card is being used about and ~80%. > > >>>> The problems arise when I increase the number of jobs running at the > > >> same > > >>>> time. If for instance 2 jobs are running at the same time, the > > >>> performance > > >>>> drops to ~25ns/day each and the usage of the video cards also drops > > >>> during > > >>>> the simulation to about a ~30-40% (and sometimes dropping to less > than > > >>> 5%). > > >>>> Clearly there is a communication problem between the gpu cards and > the > > >>> cpu > > >>>> during the simulations, but I don’t know how to solve this. > > >>>> Here is the script I use to run the simulations: > > >>>> > > >>>> #!/bin/bash -x > > >>>> #SBATCH --job-name=testAtTPC1 > > >>>> #SBATCH --ntasks-per-node=4 > > >>>> #SBATCH --cpus-per-task=20 > > >>>> #SBATCH --account=hdd22 > > >>>> #SBATCH --nodes=1 > > >>>> #SBATCH --mem=0 > > >>>> #SBATCH --output=sout.%j > > >>>> #SBATCH --error=s4err.%j > > >>>> #SBATCH --time=00:10:00 > > >>>> #SBATCH --partition=develgpus > > >>>> #SBATCH --gres=gpu:4 > > >>>> > > >>>> module use /gpfs/software/juwels/otherstages > > >>>> module load Stages/2018b > > >>>> module load Intel/2019.0.117-GCC-7.3.0 > > >>>> module load IntelMPI/2019.0.117 > > >>>> module load GROMACS/2018.3 > > >>>> > > >>>> WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 > > >>>> WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 > > >>>> WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 > > >>>> WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 > > >>>> > > >>>> DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > >>>> EXE=" gmx mdrun " > > >>>> > > >>>> cd $WORKDIR1 > > >>>> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on > -pinoffset > > >> 0 > > >>>> -ntomp 20 &>log & > > >>>> cd $WORKDIR2 > > >>>> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on > -pinoffset > > >> 10 > > >>>> -ntomp 20 &>log & > > >>>> cd $WORKDIR3 > > >>>> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on > -pinoffset > > >>> 20 > > >>>> -ntomp 20 &>log & > > >>>> cd $WORKDIR4 > > >>>> $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on > -pinoffset > > >> 30 > > >>>> -ntomp 20 &>log & > > >>>> > > >>>> > > >>>> Regarding to pinoffset, I first tried using 20 cores for each job but > > >>> then > > >>>> also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for > job > > >> 2, > > >>>> pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the > > >>> problem > > >>>> persist. > > >>>> > > >>>> Currently in this machine I’m not able to use more than 1 gpu per > job, > > >> so > > >>>> this is my only choice to use properly the whole node. > > >>>> If you need more information please just let me know. > > >>>> Best regards. > > >>>> Carlos > > >>>> > > >>>> —————— > > >>>> Carlos Navarro Retamal > > >>>> Bioinformatic Engineering. PhD. > > >>>> Postdoctoral Researcher in Center of Bioinformatics and Molecular > > >>>> Simulations > > >>>> Universidad de Talca > > >>>> Av. Lircay S/N, Talca, Chile > > >>>> E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > >>>> -- > > >>>> Gromacs Users mailing list > > >>>> > > >>>> * Please search the archive at > > >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > >>> posting! > > >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > >>>> > > >>>> * For (un)subscribe requests visit > > >>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > >>> send a mail to gmx-users-requ...@gromacs.org. > > >>> -- > > >>> Gromacs Users mailing list > > >>> > > >>> * Please search the archive at > > >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > >>> posting! > > >>> > > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > >>> > > >>> * For (un)subscribe requests visit > > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > >>> send a mail to gmx-users-requ...@gromacs.org. > > >> > > >> > > >> -- > > >> > > >> ---------- > > >> > > >> Carlos Navarro Retamal > > >> > > >> Bioinformatic Engineering. PhD > > >> > > >> Postdoctoral Researcher in Center for Bioinformatics and Molecular > > >> Simulations > > >> > > >> Universidad de Talca > > >> > > >> Av. Lircay S/N, Talca, Chile > > >> > > >> T: (+56) 712201 <//T:%20(+56)%20712201> 798 > > >> > > >> E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > >> -- > > >> Gromacs Users mailing list > > >> > > >> * Please search the archive at > > >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > >> posting! > > >> > > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > >> > > >> * For (un)subscribe requests visit > > >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > >> send a mail to gmx-users-requ...@gromacs.org. > > > > -- > > ================================================== > > > > Justin A. Lemkul, Ph.D. > > Assistant Professor > > Office: 301 Fralin Hall > > Lab: 303 Engel Hall > > > > Virginia Tech Department of Biochemistry > > 340 West Campus Dr. > > Blacksburg, VA 24061 > > > > jalem...@vt.edu | (540) 231-3129 > > http://www.thelemkullab.com > > > > ================================================== > > > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send > a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.