Re: [gmx-users] strange GPU load distribution

Alex Sun, 06 May 2018 15:12:07 -0700

A separate CPU-only build is what we were going to try, but if itsucceeds with not touching GPUs, then what -- keep several builds?

That latency you mention is definitely there, I think it is related tomy earlier report of one of the regression tests failing (I think Markmight remember that one). That failure, by the way, is persistent with2018.1 we just installed on a completely different machine.


Alex


On 5/6/2018 4:03 PM, Justin Lemkul wrote:



On 5/6/18 5:51 PM, Alex wrote:

Unfortunately, we're still bogged down when the EM runs (examplebelow) start -- CPU usage by these jobs is initially low, while theirPIDs show up in nvidia-smi. After about a minute all goes back tonormal. Because the user is doing it frequently (scripted),everything is slowed down by a large factor. Interestingly, we haveanother user utilizing a GPU with another MD package (LAMMPS) andthat GPU is never touched by these EM jobs.
Any ideas will be greatly appreciated.

Thinking out loud - a run that explicitly calls for only the CPU to beused might be trying to detect GPU if mdrun is GPU-enabled. Is that apossibility, including any latency in detecting that device? Have youtested to make sure that an mdrun binary that is explicitly disabledfrom using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage whenrunning the same command?


-Justin

Thanks,

Alex

PID TTY      STAT TIME COMMAND
60432 pts/8 Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1-nb cpu -pme cpu -deffnm em_steep

On 4/27/2018 2:16 PM, Mark Abraham wrote:

Hi,

What you think was run isn't nearly as useful when troubleshooting as
asking the kernel what is actually running.

Mark


On Fri, Apr 27, 2018, 21:59 Alex<nedoma...@gmail.com> wrote:

Mark, I copied the exact command line from the script, right abovethe

mdp file. It is literally how the script calls mdrun in this case:

gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm


On 4/27/2018 1:52 PM, Mark Abraham wrote:

Group cutoff scheme can never run on a gpu, so none of that should

matter.

Use ps and find out what the command lines were.

Mark

On Fri, Apr 27, 2018, 21:37 Alex<nedoma...@gmail.com>  wrote:
Update: we're basically removing commands one by one from thescript

that

submits the jobs causing the issue. The culprit is both EM andthe MD

run:

and GPUs are being affected _before_ MD starts loading the CPU,i.e.

this

is the initial setting up of the EM run -- CPU load is near zero,
nvidia-smi reports the mess. I wonder if this is in any wayrelated to

that

timing test we were failing a while back.

mdrun call and mdp below, though I suspect they have nothing todo with

what is happening. Any help will be very highly appreciated.

Alex

***

gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm

mdp:

; Run control
integrator               = md-vv       ; Velocity Verlet
tinit                    = 0
dt                       = 0.002
nsteps                   = 500000    ; 1 ns
nstcomm                  = 100
; Output control
nstxout                  = 50000
nstvout                  = 50000
nstfout                  = 0
nstlog                   = 50000
nstenergy                = 50000
nstxout-compressed       = 0
; Neighborsearching and short-range nonbonded interactions
cutoff-scheme            = group
nstlist                  = 10
ns_type                  = grid
pbc                      = xyz
rlist                    = 1.4
; Electrostatics
coulombtype              = cutoff
rcoulomb                 = 1.4
; van der Waals
vdwtype                  = user
vdw-modifier             = none
rvdw                     = 1.4
; Apply long range dispersion corrections for Energy and Pressure
DispCorr                  = EnerPres
; Spacing for the PME/PPPM FFT grid
fourierspacing           = 0.12
; EWALD/PME/PPPM parameters
pme_order                = 6
ewald_rtol               = 1e-06
epsilon_surface          = 0
; Temperature coupling
Tcoupl                   = nose-hoover
tc_grps                  = system
tau_t                    = 1.0
ref_t                    = some_temperature
; Pressure coupling is off for NVT
Pcoupl                   = No
tau_p                    = 0.5
compressibility          = 4.5e-05
ref_p                    = 1.0
; options for bonds
constraints              = all-bonds
constraint_algorithm     = lincs






On Fri, Apr 27, 2018 at 1:14 PM, Alex<nedoma...@gmail.com>  wrote:

As I said, only two users, and nvidia-smi shows the processname. We're
investigating and it does appear that it is EM that uses cutoff
electrostatics and as a result the user did not bother with-pme cpu in

the

mdrun call. What would be the correct way to enforce cpu-onlymdrun

when

coulombtype = cutoff?

Thanks,

Alex

On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham <

mark.j.abra...@gmail.com

wrote:
No.
Look at the processes that are running, e.g. with top or ps.Either

old

simulations or another user is running.

Mark

On Fri, Apr 27, 2018, 20:33 Alex<nedoma...@gmail.com>  wrote:
Strange. There are only two people using this machine, myselfbeing
one
of
them, and the other person specifically forces -nb cpu -pmecpu in

his

calls to mdrun. Are any other GMX utilities (e.g.insert-molecules,
grompp,
or energy) trying to use GPUs?

Thanks,

Alex

On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll <

pall.szil...@gmail.com

wrote:

The second column is PIDs so there is a whole lot more going on

there

than
just a single simulation, single rank using two GPUs. Thatwould be

one

PID
and two entries for the two GPUs. Are you sure you're notrunning

other

processes?

--
Szilárd
On Thu, Apr 26, 2018 at 5:52 AM, Alex<nedoma...@gmail.com> wrote:
Hi all,
I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on-nt 24

-ntmpi 4

-npme 1 -pme gpu -nb gpu -gputasks 1122
Once in a while the simulation slows down and nvidia-smireports

something

like this:

|    1     12981      C gmx
175MiB |
|    2     12981      C gmx
217MiB |
|    2     13083      C gmx
161MiB |
|    2     13086      C gmx
159MiB |
|    2     13089      C gmx
139MiB |
|    2     13093      C gmx
163MiB |
|    2     13096      C gmx
11MiB |
|    2     13099      C gmx
8MiB |
|    2     13102      C gmx
8MiB |
|    2     13106      C gmx
8MiB |
|    2     13109      C gmx
8MiB |
|    2     13112      C gmx
8MiB |
|    2     13115      C gmx
8MiB |
|    2     13119      C gmx
8MiB |
|    2     13122      C gmx
8MiB |
|    2     13125      C gmx
8MiB |
|    2     13128      C gmx
8MiB |
|    2     13131      C gmx
8MiB |
|    2     13134      C gmx
8MiB |
|    2     13138      C gmx
8MiB |
|    2     13141      C gmx
8MiB |
+-----------------------------------------------------------
------------------+

Then goes back to the expected load. Is this normal?

Thanks,

Alex

--
Gromacs Users mailing list

* Please search the archive athttp://www.gromacs.org/Support
/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit

https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users

or

send a mail togmx-users-requ...@gromacs.org.

--
Gromacs Users mailing list

* Please search the archive athttp://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit

https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users

or

send a mail togmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users

or

send a mail togmx-users-requ...@gromacs.org.

--
Gromacs Users mailing list

* Please search the archive athttp://www.gromacs.org/Support
/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit

https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or

send a mail togmx-users-requ...@gromacs.org.

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List  before
posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit

https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or

send a mail togmx-users-requ...@gromacs.org.

--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit

https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or

send a mail togmx-users-requ...@gromacs.org.


--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] strange GPU load distribution

Reply via email to