A separate CPU-only build is what we were going to try, but if it succeeds with not touching GPUs, then what -- keep several builds?

That latency you mention is definitely there, I think it is related to my earlier report of one of the regression tests failing (I think Mark might remember that one). That failure, by the way, is persistent with 2018.1 we just installed on a completely different machine.

Alex


On 5/6/2018 4:03 PM, Justin Lemkul wrote:


On 5/6/18 5:51 PM, Alex wrote:
Unfortunately, we're still bogged down when the EM runs (example below) start -- CPU usage by these jobs is initially low, while their PIDs show up in nvidia-smi. After about a minute all goes back to normal. Because the user is doing it frequently (scripted), everything is slowed down by a large factor. Interestingly, we have another user utilizing a GPU with another MD package (LAMMPS) and that GPU is never touched by these EM jobs.

Any ideas will be greatly appreciated.


Thinking out loud - a run that explicitly calls for only the CPU to be used might be trying to detect GPU if mdrun is GPU-enabled. Is that a possibility, including any latency in detecting that device? Have you tested to make sure that an mdrun binary that is explicitly disabled from using GPU (-DGMX_GPU=OFF) doesn't affect the GPU usage when running the same command?

-Justin

Thanks,

Alex


PID TTY      STAT TIME COMMAND

60432 pts/8    Dl+ 0:01 gmx mdrun -table ../../../tab_it.xvg -nt 1 -nb cpu -pme cpu -deffnm em_steep





On 4/27/2018 2:16 PM, Mark Abraham wrote:
Hi,

What you think was run isn't nearly as useful when troubleshooting as
asking the kernel what is actually running.

Mark


On Fri, Apr 27, 2018, 21:59 Alex<nedoma...@gmail.com> wrote:

Mark, I copied the exact command line from the script, right above the
mdp file. It is literally how the script calls mdrun in this case:

gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm


On 4/27/2018 1:52 PM, Mark Abraham wrote:
Group cutoff scheme can never run on a gpu, so none of that should
matter.
Use ps and find out what the command lines were.

Mark

On Fri, Apr 27, 2018, 21:37 Alex<nedoma...@gmail.com>  wrote:

Update: we're basically removing commands one by one from the script
that
submits the jobs causing the issue. The culprit is both EM and the MD
run:
and GPUs are being affected _before_ MD starts loading the CPU, i.e.
this
is the initial setting up of the EM run -- CPU load is near zero,
nvidia-smi reports the mess. I wonder if this is in any way related to
that
timing test we were failing a while back.
mdrun call and mdp below, though I suspect they have nothing to do with
what is happening. Any help will be very highly appreciated.

Alex

***

gmx mdrun -nt 2 -nb cpu -pme cpu -deffnm

mdp:

; Run control
integrator               = md-vv       ; Velocity Verlet
tinit                    = 0
dt                       = 0.002
nsteps                   = 500000    ; 1 ns
nstcomm                  = 100
; Output control
nstxout                  = 50000
nstvout                  = 50000
nstfout                  = 0
nstlog                   = 50000
nstenergy                = 50000
nstxout-compressed       = 0
; Neighborsearching and short-range nonbonded interactions
cutoff-scheme            = group
nstlist                  = 10
ns_type                  = grid
pbc                      = xyz
rlist                    = 1.4
; Electrostatics
coulombtype              = cutoff
rcoulomb                 = 1.4
; van der Waals
vdwtype                  = user
vdw-modifier             = none
rvdw                     = 1.4
; Apply long range dispersion corrections for Energy and Pressure
DispCorr                  = EnerPres
; Spacing for the PME/PPPM FFT grid
fourierspacing           = 0.12
; EWALD/PME/PPPM parameters
pme_order                = 6
ewald_rtol               = 1e-06
epsilon_surface          = 0
; Temperature coupling
Tcoupl                   = nose-hoover
tc_grps                  = system
tau_t                    = 1.0
ref_t                    = some_temperature
; Pressure coupling is off for NVT
Pcoupl                   = No
tau_p                    = 0.5
compressibility          = 4.5e-05
ref_p                    = 1.0
; options for bonds
constraints              = all-bonds
constraint_algorithm     = lincs






On Fri, Apr 27, 2018 at 1:14 PM, Alex<nedoma...@gmail.com>  wrote:

As I said, only two users, and nvidia-smi shows the process name. We're
investigating and it does appear that it is EM that uses cutoff
electrostatics and as a result the user did not bother with -pme cpu in
the
mdrun call. What would be the correct way to enforce cpu-only mdrun
when
coulombtype = cutoff?

Thanks,

Alex

On Fri, Apr 27, 2018 at 12:45 PM, Mark Abraham <
mark.j.abra...@gmail.com
wrote:

No.

Look at the processes that are running, e.g. with top or ps. Either
old
simulations or another user is running.

Mark

On Fri, Apr 27, 2018, 20:33 Alex<nedoma...@gmail.com>  wrote:

Strange. There are only two people using this machine, myself being
one
of
them, and the other person specifically forces -nb cpu -pme cpu in
his
calls to mdrun. Are any other GMX utilities (e.g. insert-molecules,
grompp,
or energy) trying to use GPUs?

Thanks,

Alex

On Fri, Apr 27, 2018 at 5:33 AM, Szilárd Páll <
pall.szil...@gmail.com
wrote:

The second column is PIDs so there is a whole lot more going on
there
than
just a single simulation, single rank using two GPUs. That would be
one
PID
and two entries for the two GPUs. Are you sure you're not running
other
processes?

--
Szilárd

On Thu, Apr 26, 2018 at 5:52 AM, Alex<nedoma...@gmail.com>  wrote:

Hi all,

I am running GMX 2018 with gmx mdrun -pinoffset 0 -pin on -nt 24
-ntmpi 4
-npme 1 -pme gpu -nb gpu -gputasks 1122

Once in a while the simulation slows down and nvidia-smi reports
something
like this:

|    1     12981      C gmx
175MiB |
|    2     12981      C gmx
217MiB |
|    2     13083      C gmx
161MiB |
|    2     13086      C gmx
159MiB |
|    2     13089      C gmx
139MiB |
|    2     13093      C gmx
163MiB |
|    2     13096      C gmx
11MiB |
|    2     13099      C gmx
8MiB |
|    2     13102      C gmx
8MiB |
|    2     13106      C gmx
8MiB |
|    2     13109      C gmx
8MiB |
|    2     13112      C gmx
8MiB |
|    2     13115      C gmx
8MiB |
|    2     13119      C gmx
8MiB |
|    2     13122      C gmx
8MiB |
|    2     13125      C gmx
8MiB |
|    2     13128      C gmx
8MiB |
|    2     13131      C gmx
8MiB |
|    2     13134      C gmx
8MiB |
|    2     13138      C gmx
8MiB |
|    2     13141      C gmx
8MiB |
+-----------------------------------------------------------
------------------+

Then goes back to the expected load. Is this normal?

Thanks,

Alex

--
Gromacs Users mailing list

* Please search the archive athttp://www.gromacs.org/Support
/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit

https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
or
send a mail togmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive athttp://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
or
send a mail togmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List  before
posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
or
send a mail togmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive athttp://www.gromacs.org/Support
/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users  or
send a mail togmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List  before
posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users  or
send a mail togmx-users-requ...@gromacs.org.
--
Gromacs Users mailing list

* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!

* Can't post? Readhttp://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users  or
send a mail togmx-users-requ...@gromacs.org.




--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to