On 7/29/19 8:46 AM, Carlos Navarro wrote:
Hi Mark,
I tried that before, but unfortunately in that case (removing —gres=gpu:1
and including in each line the -gpu_id flag) for some reason the jobs are
run one at a time (one after the other), so I can’t use properly the whole
node.


You need to run all but the last mdrun process in the background (&).

-Justin

——————
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 29, 2019 at 11:48:21 AM, Mark Abraham (mark.j.abra...@gmail.com)
wrote:

Hi,

When you use

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "

then the environment seems to make sure only one GPU is visible. (The log
files report only finding one GPU.) But it's probably the same GPU in each
case, with three remaining idle. I would suggest not using --gres unless
you can specify *which* of the four available GPUs each run can use.

Otherwise, don't use --gres and use the facilities built into GROMACS, e.g.

$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 0
-ntomp 20 -gpu_id 0
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 10
-ntomp 20 -gpu_id 1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset 20
-ntomp 20 -gpu_id 2
etc.

Mark

On Mon, 29 Jul 2019 at 11:34, Carlos Navarro <carlos.navarr...@gmail.com>
wrote:

Hi Szilárd,
To answer your questions:
**are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
I'm trying to run multiple simulations on the same node at the same time.

** what are you simulating?
Regular and CompEl simulations

** can you provide log files of the runs?
In the following link are some logs files:
https://www.dropbox.com/s/7q249vbqqwf5r03/Archive.zip?dl=0.
In short, alone.log -> single run in the node (using 1 gpu).
multi1/2/3/4.log ->4 independent simulations ran at the same time in a
single node. In all cases, 20 cpus are used.
Best regards,
Carlos

El jue., 25 jul. 2019 a las 10:59, Szilárd Páll (<pall.szil...@gmail.com>)
escribió:

Hi,

It is not clear to me how are you trying to set up your runs, so
please provide some details:
- are you trying to run multiple simulations concurrently on the same
node or are you trying to strong-scale?
- what are you simulating?
- can you provide log files of the runs?

Cheers,

--
Szilárd

On Tue, Jul 23, 2019 at 1:34 AM Carlos Navarro
<carlos.navarr...@gmail.com> wrote:
No one can give me an idea of what can be happening? Or how I can
solve
it?
Best regards,
Carlos

——————
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl

On July 19, 2019 at 2:20:41 PM, Carlos Navarro (
carlos.navarr...@gmail.com)
wrote:

Dear gmx-users,
I’m currently working in a server where each node posses 40 physical
cores
(40 threads) and 4 Nvidia-V100.
When I launch a single job (1 simulation using a single gpu card) I
get a
performance of about ~35ns/day in a system of about 300k atoms.
Looking
into the usage of the video card during the simulation I notice that
the
card is being used about and ~80%.
The problems arise when I increase the number of jobs running at the
same
time. If for instance 2 jobs are running at the same time, the
performance
drops to ~25ns/day each and the usage of the video cards also drops
during
the simulation to about a ~30-40% (and sometimes dropping to less than
5%).
Clearly there is a communication problem between the gpu cards and the
cpu
during the simulations, but I don’t know how to solve this.
Here is the script I use to run the simulations:

#!/bin/bash -x
#SBATCH --job-name=testAtTPC1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=20
#SBATCH --account=hdd22
#SBATCH --nodes=1
#SBATCH --mem=0
#SBATCH --output=sout.%j
#SBATCH --error=s4err.%j
#SBATCH --time=00:10:00
#SBATCH --partition=develgpus
#SBATCH --gres=gpu:4

module use /gpfs/software/juwels/otherstages
module load Stages/2018b
module load Intel/2019.0.117-GCC-7.3.0
module load IntelMPI/2019.0.117
module load GROMACS/2018.3

WORKDIR1=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/1
WORKDIR2=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/2
WORKDIR3=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/3
WORKDIR4=/p/project/chdd22/gromacs/benchmark/AtTPC1/singlegpu/4

DO_PARALLEL=" srun --exclusive -n 1 --gres=gpu:1 "
EXE=" gmx mdrun "

cd $WORKDIR1
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
0
-ntomp 20 &>log &
cd $WORKDIR2
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
10
-ntomp 20 &>log &
cd $WORKDIR3
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
20
-ntomp 20 &>log &
cd $WORKDIR4
$DO_PARALLEL $EXE -s eq6.tpr -deffnm eq6-20 -nmpi 1 -pin on -pinoffset
30
-ntomp 20 &>log &


Regarding to pinoffset, I first tried using 20 cores for each job but
then
also tried with 8 cores (so pinoffset 0 for job 1, pinoffset 4 for job
2,
pinoffset 8 for job 3 and pinoffset 12 for job) but at the end the
problem
persist.

Currently in this machine I’m not able to use more than 1 gpu per job,
so
this is my only choice to use properly the whole node.
If you need more information please just let me know.
Best regards.
Carlos

——————
Carlos Navarro Retamal
Bioinformatic Engineering. PhD.
Postdoctoral Researcher in Center of Bioinformatics and Molecular
Simulations
Universidad de Talca
Av. Lircay S/N, Talca, Chile
E: carlos.navarr...@gmail.com or cnava...@utalca.cl
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.


--

----------

Carlos Navarro Retamal

Bioinformatic Engineering. PhD

Postdoctoral Researcher in Center for Bioinformatics and Molecular
Simulations

Universidad de Talca

Av. Lircay S/N, Talca, Chile

T: (+56) 712201 <//T:%20(+56)%20712201> 798

E: carlos.navarr...@gmail.com or cnava...@utalca.cl
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.

--
==================================================

Justin A. Lemkul, Ph.D.
Assistant Professor
Office: 301 Fralin Hall
Lab: 303 Engel Hall

Virginia Tech Department of Biochemistry
340 West Campus Dr.
Blacksburg, VA 24061

jalem...@vt.edu | (540) 231-3129
http://www.thelemkullab.com

==================================================

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to