The Quadro K2200 is a low-end several generations old GPU and I strongly doubt you will see any benefit from using it.
I suggest you try running mdrun -nb gpu -ntmpi 1 -ntomp 36 -pin on which will give you the (most likely best) performance you can get when using both high-end Intel CPUs and the GPU. Compare that to: mdrun -nb cpu -ntmpi 1 -ntomp 36 -pin on mdrun -nb cpu -ntmpi 36 -ntomp 2 -pin on [with or without -npme 0] mdrun -nb cpu -ntmpi 18 -ntomp 4 -pin on [with or without -npme 0] I suspect one of the the latter two will be fastest. You can also try using no domain-decomposition and 72 threads per rank by recompiling GROMACS and setting the cmake . -DGMX_OPENMP_MAX_THREADS=128 option. I will could end up being faster than the first suggested CPU run, but not likely to be faster (by a relevant amount of any) than the latter two. If the above technicalities are not clear and you would like to learn about them to understand them better, I recommend reading (again?) the relevant parts of the user guide. Cheers, -- Szilárd On Wed, Feb 27, 2019 at 12:27 PM Lalehan Ozalp <lalehan.oz...@gmail.com> wrote: > Dear Szilárd, > There is indeed one GPU. And please keep in mind I used to exploit the -nt > 72 option BEFORE the 2019-dev version. It looks like it employs GPU by > default and I don't know how to efficiently use it, apparently. Here is > the info you asked for: > System size: 130655 atoms > > .mdp file: > ; Run parameters > integrator = md ; leap-frog integrator > nsteps = 15000000 ; 2 * 15000000 = 30000 ps (30 ns) > dt = 0.002 ; 2 fs > ; Output control > nstenergy = 5000 ; save energies every 10.0 ps > nstlog = 5000 ; update log file every 10.0 ps > nstxout-compressed = 5000 ; save coordinates every 10.0 ps > ; Bond parameters > continuation = yes ; continuing from NPT > constraint_algorithm = lincs ; holonomic constraints > constraints = h-bonds ; bonds to H are constrained > lincs_iter = 1 ; accuracy of LINCS > lincs_order = 4 ; also related to accuracy > ; Neighbor searching and vdW > cutoff-scheme = Verlet > ns_type = grid ; search neighboring grid cells > nstlist = 20 ; largely irrelevant with Verlet > rlist = 1.2 > vdwtype = cutoff > vdw-modifier = force-switch > rvdw-switch = 1.0 > rvdw = 1.2 ; short-range van der Waals cutoff (in > nm) > ; Electrostatics > coulombtype = PME ; Particle Mesh Ewald for long-range > electrostatics > rcoulomb = 1.2 > pme_order = 4 ; cubic interpolation > fourierspacing = 0.16 ; grid spacing for FFT > ; Temperature coupling > tcoupl = V-rescale ; modified > Berendsen thermostat > tc-grps = Protein_nap_16 Water_and_ions ; two coupling > groups - more accurate > tau_t = 0.1 0.1 ; time constant, in > ps > ref_t = 300 300 ; reference > temperature, one for each group, in K > ; Pressure coupling > pcoupl = Parrinello-Rahman ; pressure coupling > is on for NPT > pcoupltype = isotropic ; uniform scaling > of box vectors > > > > my command: > gmx mdrun -deffnm md_0_30 -ntmpi 4 -ntomp 18 -npme 1 -pme gpu -nb gpu > > > > and what the program prints in the log file once I run it: > > GROMACS version: 2019-dev > Precision: single > Memory model: 64 bit > MPI library: thread_mpi > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) > GPU support: CUDA > SIMD instructions: NONE > FFT library: fftw-3.3.8 > RDTSCP usage: disabled > TNG support: enabled > Hwloc support: disabled > Tracing support: disabled > Built on: 2019-01-22 13:53:24 > Build CPU vendor: Unknown > Build CPU brand: Unknown > Build CPU family: 0 Model: 0 Stepping: 0 > Build CPU features: Unknown > C compiler: /usr/local/bin/gcc GNU 5.3.0 > C++ compiler flags: -std=c++11 -Wundef -Wextra > -Wno-missing-field-initializers -Wpointer-arith -Wmissing-declarations > -Wall -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast > -Wno-array-bounds > CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler > driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on > Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0, V8.0.61 > > Running on 1 node with total 36 cores, 72 logical cores, 1 compatible GPU > Hardware detected: > CPU info: > Vendor: Intel > Brand: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz > Family: 6 Model: 63 Stepping: 2 > > GPU info: > Number of GPUs detected: 1 > #0: NVIDIA Quadro K2200, compute cap.: 5.0, ECC: no, stat: compatible > > Highest SIMD level requested by all nodes in run: AVX2_256 > SIMD instructions selected at compile time: None > This program was compiled for different hardware than you are running on, > which could influence performance. > The current CPU can measure timings more accurately than the code in > gmx mdrun was configured to use. This might affect your simulation > speed as accurate timings are needed for load-balancing. > > > > Hardware: > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 72 > On-line CPU(s) list: 0-71 > Thread(s) per core: 2 > Core(s) per socket: 18 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family: 6 > Model: 63 > Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz > Stepping: 2 > CPU MHz: 1200.000 > BogoMIPS: 4589.66 > Virtualization: VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 46080K > NUMA node0 CPU(s): 0-17,36-53 > NUMA node1 CPU(s): 18-35,54-71 > > > > GPU: > > 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL > [Quadro K2200] [10de:13ba] (rev a2) (prog-if 00 [VGA controller]) > Subsystem: NVIDIA Corporation Device [10de:1097] > Physical Slot: 2 > Flags: bus master, fast devsel, latency 0, IRQ 232 > Memory at d2000000 (32-bit, non-prefetchable) [size=16M] > Memory at c0000000 (64-bit, prefetchable) [size=256M] > Memory at d0000000 (64-bit, prefetchable) [size=32M] > I/O ports at 4000 [size=128] > [virtual] Expansion ROM at d3000000 [disabled] [size=512K] > Capabilities: <access denied> > Kernel driver in use: nvidia > Kernel modules: nvidia-drm, nvidia, nouveau, nvidiafb > > > Hope I didn't flooded with too much information. > Thank you very much for your interest. > Best, > > Lalehan > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.