Hi, It is not clear to me how are you trying to set up your runs, so please provide some details: - are you trying to run multiple simulations concurrently on the same node or are you trying to strong-scale? - what are you simulating? - can you provide log files of the runs?
Cheers, -- Szilárd On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro <[email protected]> wrote: > > No one can give me an idea of what can be happening? Or how I can solve it? > Best regards, > Carlos > > —————— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: [email protected] or [email protected] > > On July 19, 2019 at 2:20:41 PM, Carlos Navarro ([email protected]) > wrote: > > Dear gmx-users, > I’m currently working in a server where each node posses 40 physical cores > (40 threads) and 4 Nvidia-V100. > When I launch a single job (1 simulation using a single gpu card) I get a > performance of about ~35ns/day in a system of about 300k atoms. Looking > into the usage of the video card during the simulation I notice that the > card is being used about and ~80%. > The problems arise when I increase the number of jobs running at the same > time. If for instance 2 jobs are running at the same time, the performance > drops to ~25ns/day each and the usage of the video cards also drops during > the simulation to about a ~30-40% (and sometimes dropping to less than 5%). > Clearly there is a communication problem between the gpu cards and the cpu > during the simulations, but I don’t know how to solve this. > Here is the script I use to run the simulations: > > #!/bin/bash -x > #SBATCH --job-name=testAtTPC1 > #SBATCH --ntasks-per-node=4 > #SBATCH --cpus-per-task=20 > #SBATCH --account=hdd22 > #SBATCH --nodes=1 > #SBATCH --mem=0 > #SBATCH --output=sout.%j > #SBATCH --error=s4err.%j > #SBATCH --time=00:10:00 > #SBATCH --partition=develgpus > #SBATCH --gres=gpu:4 > > module use /gpfs/software/juwels/otherstages > module load Stages/2018b > module load Intel/2019.0.117-GCC-7.3.0 > module load IntelMPI/2019.0.117 > module load GROMACS/2018.3 > > WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1 > WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2 > WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3 > WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4 > > DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 " > EXE=" gmx mdrun " > > cd $WORKDIR1 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0 > -ntomp 20 &>log & > cd $WORKDIR2 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10 > -ntomp 20 &>log & > cd $WORKDIR3 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20 > -ntomp 20 &>log & > cd $WORKDIR4 > $DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 30 > -ntomp 20 &>log & > > > Regarding to pinoffset, I first tried using 20 cores for each job but then > also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job 2, > pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the problem > persist. > > Currently in this machine I’m not able to use more than 1 gpu per job, so > this is my only choice to use properly the whole node. > If you need more information please just let me know. > Best regards. > Carlos > > —————— > Carlos Navarro Retamal > Bioinformatic Engineering. PhD. > Postdoctoral Researcher in Center of Bioinformatics and Molecular > Simulations > Universidad de Talca > Av. Lircay S/N, Talca, Chile > E: [email protected] or [email protected] > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to [email protected]. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to [email protected].
