Dear Szilárd, There is indeed one GPU. And please keep in mind I used to exploit the -nt 72 option BEFORE the 2019-dev version. It looks like it employs GPU by default and I don't know how to efficiently use it, apparently. Here is the info you asked for: System size: 130655 atoms
.mdp file: ; Run parameters integrator = md ; leap-frog integrator nsteps = 15000000 ; 2 * 15000000 = 30000 ps (30 ns) dt = 0.002 ; 2 fs ; Output control nstenergy = 5000 ; save energies every 10.0 ps nstlog = 5000 ; update log file every 10.0 ps nstxout-compressed = 5000 ; save coordinates every 10.0 ps ; Bond parameters continuation = yes ; continuing from NPT constraint_algorithm = lincs ; holonomic constraints constraints = h-bonds ; bonds to H are constrained lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy ; Neighbor searching and vdW cutoff-scheme = Verlet ns_type = grid ; search neighboring grid cells nstlist = 20 ; largely irrelevant with Verlet rlist = 1.2 vdwtype = cutoff vdw-modifier = force-switch rvdw-switch = 1.0 rvdw = 1.2 ; short-range van der Waals cutoff (in nm) ; Electrostatics coulombtype = PME ; Particle Mesh Ewald for long-range electrostatics rcoulomb = 1.2 pme_order = 4 ; cubic interpolation fourierspacing = 0.16 ; grid spacing for FFT ; Temperature coupling tcoupl = V-rescale ; modified Berendsen thermostat tc-grps = Protein_nap_16 Water_and_ions ; two coupling groups - more accurate tau_t = 0.1 0.1 ; time constant, in ps ref_t = 300 300 ; reference temperature, one for each group, in K ; Pressure coupling pcoupl = Parrinello-Rahman ; pressure coupling is on for NPT pcoupltype = isotropic ; uniform scaling of box vectors my command: gmx mdrun -deffnm md_0_30 -ntmpi 4 -ntomp 18 -npme 1 -pme gpu -nb gpu and what the program prints in the log file once I run it: GROMACS version: 2019-dev Precision: single Memory model: 64 bit MPI library: thread_mpi OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) GPU support: CUDA SIMD instructions: NONE FFT library: fftw-3.3.8 RDTSCP usage: disabled TNG support: enabled Hwloc support: disabled Tracing support: disabled Built on: 2019-01-22 13:53:24 Build CPU vendor: Unknown Build CPU brand: Unknown Build CPU family: 0 Model: 0 Stepping: 0 Build CPU features: Unknown C compiler: /usr/local/bin/gcc GNU 5.3.0 C++ compiler flags: -std=c++11 -Wundef -Wextra -Wno-missing-field-initializers -Wpointer-arith -Wmissing-declarations -Wall -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on Tue_Jan_10_13:22:03_CST_2017;Cuda compilation tools, release 8.0, V8.0.61 Running on 1 node with total 36 cores, 72 logical cores, 1 compatible GPU Hardware detected: CPU info: Vendor: Intel Brand: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Family: 6 Model: 63 Stepping: 2 GPU info: Number of GPUs detected: 1 #0: NVIDIA Quadro K2200, compute cap.: 5.0, ECC: no, stat: compatible Highest SIMD level requested by all nodes in run: AVX2_256 SIMD instructions selected at compile time: None This program was compiled for different hardware than you are running on, which could influence performance. The current CPU can measure timings more accurately than the code in gmx mdrun was configured to use. This might affect your simulation speed as accurate timings are needed for load-balancing. Hardware: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 72 On-line CPU(s) list: 0-71 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz Stepping: 2 CPU MHz: 1200.000 BogoMIPS: 4589.66 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-17,36-53 NUMA node1 CPU(s): 18-35,54-71 GPU: 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Quadro K2200] [10de:13ba] (rev a2) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device [10de:1097] Physical Slot: 2 Flags: bus master, fast devsel, latency 0, IRQ 232 Memory at d2000000 (32-bit, non-prefetchable) [size=16M] Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d0000000 (64-bit, prefetchable) [size=32M] I/O ports at 4000 [size=128] [virtual] Expansion ROM at d3000000 [disabled] [size=512K] Capabilities: <access denied> Kernel driver in use: nvidia Kernel modules: nvidia-drm, nvidia, nouveau, nvidiafb Hope I didn't flooded with too much information. Thank you very much for your interest. Best, Lalehan -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.