Hi Szilárd, To answer your questions: **are you trying to run multiple simulations concurrently on the same node or are you trying to strong-scale? I'm trying to run multiple simulations on the same node at the same time.
** what are you simulating? Regular and CompEl simulations ** can you provide log files of the runs? In the following link are some logs files: https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0. In short, alone.log -> single run in the node (using 1 gpu). multi1/2/3/4.log ->4 independent simulations ran at the same time in a single node. In all cases, 20 cpus are used. Best regards, Carlos El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (<pall.szil...@gmail.com>) escribió: > Hi, > > It is not clear to me how are you trying to set up your runs, so > please provide some details: > - are you trying to run multiple simulations concurrently on the same > node or are you trying to strong-scale? > - what are you simulating? > - can you provide log files of the runs? > > Cheers, > > -- > Szilárd > > On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro > <carlos.navarr...@gmail.com> wrote: > > > > No one can give me an idea of what can be happening? Or how I can solve > it? > > Best regards, > > Carlos > > > > —————— > > Carlos Navarro Retamal > > Bioinformatic Engineering. PhD. > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > Simulations > > Universidad de Talca > > Av. Lircay S/N, Talca, Chile > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro ( > carlos.navarr...@gmail.com) > > wrote: > > > > Dear gmx-users, > > I’m currently working in a server where each node posses 40 physical > cores > > (40 threads) and 4 Nvidia-V100. > > When I launch a single job (1 simulation using a single gpu card) I get a > > performance of about ~35ns/day in a system of about 300k atoms. Looking > > into the usage of the video card during the simulation I notice that the > > card is being used about and ~80%. > > The problems arise when I increase the number of jobs running at the same > > time. If for instance 2 jobs are running at the same time, the > performance > > drops to ~25ns/day each and the usage of the video cards also drops > during > > the simulation to about a ~30-40% (and sometimes dropping to less than > 5%). > > Clearly there is a communication problem between the gpu cards and the > cpu > > during the simulations, but I don’t know how to solve this. > > Here is the script I use to run the simulations: > > > > #!/bin/bash -x > > #SBATCH --job-name=testAtTPC1 > > #SBATCH --ntasks-per-node=4 > > #SBATCH --cpus-per-task=20 > > #SBATCH --account=hdd22 > > #SBATCH --nodes=1 > > #SBATCH --mem=0 > > #SBATCH --output=sout.%j > > #SBATCH --error=s4err.%j > > #SBATCH --time=00:10:00 > > #SBATCH --partition=develgpus > > #SBATCH --gres=gpu:4 > > > > module use /gpfs/software/juwels/otherstages > > module load Stages/2018b > > module load Intel/2019.0.117-GCC-7.3.0 > > module load IntelMPI/2019.0.117 > > module load GROMACS/2018.3 > > > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 > > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 > > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 > > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 > > > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > > EXE=" gmx mdrun " > > > > cd $WORKDIR1 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > > -ntomp 20 &>log & > > cd $WORKDIR2 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 > > -ntomp 20 &>log & > > cd $WORKDIR3 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset > 20 > > -ntomp 20 &>log & > > cd $WORKDIR4 > > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30 > > -ntomp 20 &>log & > > > > > > Regarding to pinoffset, I first tried using 20 cores for each job but > then > > also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2, > > pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the > problem > > persist. > > > > Currently in this machine I’m not able to use more than 1 gpu per job, so > > this is my only choice to use properly the whole node. > > If you need more information please just let me know. > > Best regards. > > Carlos > > > > —————— > > Carlos Navarro Retamal > > Bioinformatic Engineering. PhD. > > Postdoctoral Researcher in Center of Bioinformatics and Molecular > > Simulations > > Universidad de Talca > > Av. Lircay S/N, Talca, Chile > > E: carlos.navarr...@gmail.com or cnava...@utalca.cl > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- ---------- Carlos Navarro Retamal Bioinformatic Engineering. PhD Postdoctoral Researcher in Center for Bioinformatics and Molecular Simulations Universidad de Talca Av. Lircay S/N, Talca, Chile T: (+56) 712201 <//T:%20(+56)%20712201> 798 E: carlos.navarr...@gmail.com or cnava...@utalca.cl -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.