Mark, Thank you. Then I have an issue I can not find a way to solve.
My MD using GPU fails at the very beginning while CPU-only MD runs no problem with the same tpr file. I can not find what "HtoD cudaMemcpyAsync failed: invalid argument" means. Here is some diagnostics. $ uname -a Linux didesk 4.15.0-36-generic #39-Ubuntu SMP Mon Sep 24 16:19:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux dikov@didesk ~ $ gcc --version gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. dikov@didesk ~ $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 dikov@didesk ~ $ GPU setup: sudo nvidia-smi -pm ENABLED -i 0 sudo nvidia-smi -ac 4513,1733 -i 0 MD.log Log file opened on Tue Nov 13 16:16:22 2018 Host: didesk pid: 45669 rank ID: 0 number of ranks: 1 :-) GROMACS - gmx mdrun, 2018.3 (-: GROMACS is written by: Emile Apol Rossen Apostolov Paul Bauer Herman J.C. Berendsen Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra Gerrit Groenhof Aleksei Iupinov Christoph Junghans Anca Hamuraru Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus Carsten Kutzner Per Larsson Justin A. Lemkul Viveca Lindahl Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov Michael Shirts Alfons Sijbers Peter Tieleman Teemu Virolainen Christian Wennberg Maarten Wolf and the project leaders: Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel Copyright (c) 1991-2000, University of Groningen, The Netherlands. Copyright (c) 2001-2017, The GROMACS development team at Uppsala University, Stockholm University and the Royal Institute of Technology, Sweden. check out http://www.gromacs.org for more information. GROMACS is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. GROMACS: gmx mdrun, version 2018.3 Executable: /usr/local/gromacs/bin/gmx Data prefix: /usr/local/gromacs Working dir: /home/dikov/Documents/Cients/DavidL/MD/GPU Command line: gmx mdrun -deffnm md200ns -v GROMACS version: 2018.3 Precision: single Memory model: 64 bit MPI library: thread_mpi OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) GPU support: CUDA SIMD instructions: AVX_512 FFT library: fftw-3.3.7-sse2-avx RDTSCP usage: enabled TNG support: enabled Hwloc support: hwloc-1.11.6 Tracing support: disabled Built on: 2018-11-13 21:31:10 Built by: dikov@didesk [CMAKE] Build OS/arch: Linux 4.15.0-36-generic x86_64 Build CPU vendor: Intel Build CPU brand: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz Build CPU family: 6 Model: 85 Stepping: 4 Build CPU features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic C compiler: /usr/bin/cc GNU 7.3.0 C compiler flags: -mavx512f -mfma -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast C++ compiler: /usr/bin/c++ GNU 7.3.0 C++ compiler flags: -mavx512f -mfma -std=c++11 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on Sat_Aug_25_21:08:01_CDT_2018;Cuda compilation tools, release 10.0, V10.0.130 CUDA compiler flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;; ;-mavx512f;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast; CUDA driver: 10.0 CUDA runtime: 10.0 Running on 1 node with total 36 cores, 72 logical cores, 1 compatible GPU Hardware detected: CPU info: Vendor: Intel Brand: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz Family: 6 Model: 85 Stepping: 4 Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Number of AVX-512 FMA units: 2 Hardware topology: Full, with devices Sockets, cores, and logical processors: Socket 0: [ 0 36] [ 1 37] [ 2 38] [ 3 39] [ 4 40] [ 5 41] [ 6 42] [ 7 43] [ 8 44] [ 9 45] [ 10 46] [ 11 47] [ 12 48] [ 13 49] [ 14 50] [ 15 51] [ 16 52] [ 17 53] Socket 1: [ 18 54] [ 19 55] [ 20 56] [ 21 57] [ 22 58] [ 23 59] [ 24 60] [ 25 61] [ 26 62] [ 27 63] [ 28 64] [ 29 65] [ 30 66] [ 31 67] [ 32 68] [ 33 69] [ 34 70] [ 35 71] Numa nodes: Node 0 (33376423936 bytes mem): 0 36 1 37 2 38 3 39 4 40 5 41 6 42 7 43 8 44 9 45 10 46 11 47 12 48 13 49 14 50 15 51 16 52 17 53 Node 1 (33792262144 bytes mem): 18 54 19 55 20 56 21 57 22 58 23 59 24 60 25 61 26 62 27 63 28 64 29 65 30 66 31 67 32 68 33 69 34 70 35 71 Latency: 0 1 0 1.00 2.10 1 2.10 1.00 Caches: L1: 32768 bytes, linesize 64 bytes, assoc. 8, shared 2 ways L2: 1048576 bytes, linesize 64 bytes, assoc. 16, shared 2 ways L3: 25952256 bytes, linesize 64 bytes, assoc. 11, shared 36 ways PCI devices: 0000:00:11.5 Id: 8086:a1d2 Class: 0x0106 Numa: 0 0000:00:16.2 Id: 8086:a1bc Class: 0x0101 Numa: 0 0000:00:17.0 Id: 8086:2826 Class: 0x0104 Numa: 0 0000:02:00.0 Id: 8086:1533 Class: 0x0200 Numa: 0 0000:00:1f.6 Id: 8086:15b9 Class: 0x0200 Numa: 0 0000:91:00.0 Id: 144d:a808 Class: 0x0108 Numa: 0 0000:d5:00.0 Id: 10de:1bb0 Class: 0x0300 Numa: 0 GPU info: Number of GPUs detected: 1 #0: NVIDIA Quadro P5000, compute cap.: 6.1, ECC: no, stat: compatible ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. Lindahl GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers SoftwareX 1 (2015) pp. 19-25 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit Bioinformatics 29 (2013) pp. 845-54 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable molecular simulation J. Chem. Theory Comput. 4 (2008) pp. 435-447 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. Berendsen GROMACS: Fast, Flexible and Free J. Comp. Chem. 26 (2005) pp. 1701-1719 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ E. Lindahl and B. Hess and D. van der Spoel GROMACS 3.0: A package for molecular simulation and trajectory analysis J. Mol. Mod. 7 (2001) pp. 306-317 -------- -------- --- Thank You --- -------- -------- ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ H. J. C. Berendsen, D. van der Spoel and R. van Drunen GROMACS: A message-passing parallel molecular dynamics implementation Comp. Phys. Comm. 91 (1995) pp. 43-56 -------- -------- --- Thank You --- -------- -------- Input Parameters: integrator = md tinit = 0 dt = 0.002 nsteps = 100000000 init-step = 0 simulation-part = 1 comm-mode = Linear nstcomm = 100 bd-fric = 0 ld-seed = 718849372 emtol = 10 emstep = 0.01 niter = 20 fcstep = 0 nstcgsteep = 1000 nbfgscorr = 10 rtpi = 0.05 nstxout = 0 nstvout = 0 nstfout = 0 nstlog = 1000 nstcalcenergy = 100 nstenergy = 5000 nstxout-compressed = 5000 compressed-x-precision = 1000 cutoff-scheme = Verlet nstlist = 20 ns-type = Grid pbc = xyz periodic-molecules = false verlet-buffer-tolerance = 0.005 rlist = 0.931 coulombtype = PME coulomb-modifier = Potential-shift rcoulomb-switch = 0 rcoulomb = 0.9 epsilon-r = 1 epsilon-rf = inf vdw-type = Cut-off vdw-modifier = Potential-shift rvdw-switch = 0 rvdw = 0.9 DispCorr = EnerPres table-extension = 1 fourierspacing = 0.16 fourier-nx = 52 fourier-ny = 60 fourier-nz = 72 pme-order = 4 ewald-rtol = 1e-05 ewald-rtol-lj = 0.001 lj-pme-comb-rule = Geometric ewald-geometry = 0 epsilon-surface = 0 implicit-solvent = No gb-algorithm = Still nstgbradii = 1 rgbradii = 1 gb-epsilon-solvent = 80 gb-saltconc = 0 gb-obc-alpha = 1 gb-obc-beta = 0.8 gb-obc-gamma = 4.85 gb-dielectric-offset = 0.009 sa-algorithm = Ace-approximation sa-surface-tension = 2.05016 tcoupl = V-rescale nsttcouple = 20 nh-chain-length = 0 print-nose-hoover-chain-variables = false pcoupl = Parrinello-Rahman pcoupltype = Isotropic nstpcouple = 20 tau-p = 2 compressibility (3x3): compressibility[ 0]={ 4.50000e-05, 0.00000e+00, 0.00000e+00} compressibility[ 1]={ 0.00000e+00, 4.50000e-05, 0.00000e+00} compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 4.50000e-05} ref-p (3x3): ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00} ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00} ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00} refcoord-scaling = No posres-com (3): posres-com[0]= 0.00000e+00 posres-com[1]= 0.00000e+00 posres-com[2]= 0.00000e+00 posres-comB (3): posres-comB[0]= 0.00000e+00 posres-comB[1]= 0.00000e+00 posres-comB[2]= 0.00000e+00 QMMM = false QMconstraints = 0 QMMMscheme = 0 MMChargeScaleFactor = 1 qm-opts: ngQM = 0 constraint-algorithm = Lincs continuation = true Shake-SOR = false shake-tol = 0.0001 lincs-order = 4 lincs-iter = 1 lincs-warnangle = 30 nwall = 0 wall-type = 9-3 wall-r-linpot = -1 wall-atomtype[0] = -1 wall-atomtype[1] = -1 wall-density[0] = 0 wall-density[1] = 0 wall-ewald-zfac = 3 pull = false awh = false rotation = false interactiveMD = false disre = No disre-weighting = Conservative disre-mixed = false dr-fc = 1000 dr-tau = 0 nstdisreout = 100 orire-fc = 0 orire-tau = 0 nstorireout = 100 free-energy = no cos-acceleration = 0 deform (3x3): deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} simulated-tempering = false swapcoords = no userint1 = 0 userint2 = 0 userint3 = 0 userint4 = 0 userreal1 = 0 userreal2 = 0 userreal3 = 0 userreal4 = 0 applied-forces: electric-field: x: E0 = 0 omega = 0 t0 = 0 sigma = 0 y: E0 = 0 omega = 0 t0 = 0 sigma = 0 z: E0 = 0 omega = 0 t0 = 0 sigma = 0 grpopts: nrdf: 16011.7 141396 ref-t: 300 300 tau-t: 0.1 0.1 annealing: No No annealing-npoints: 0 0 acc: 0 0 0 nfreeze: N N N energygrp-flags[ 0]: 0 Changing nstlist from 20 to 80, rlist from 0.931 to 1.049 Using 1 MPI thread Using 36 OpenMP threads 1 GPU auto-selected for this run. Mapping of GPU IDs to the 2 GPU tasks in the 1 rank on this node: PP:0,PME:0 Application clocks (GPU clocks) for Quadro P5000 are (4513,1733) Application clocks (GPU clocks) for Quadro P5000 are (4513,1733) Pinning threads with an auto-selected logical core stride of 2 System total charge: -0.000 Will do PME sum in reciprocal space for electrostatic interactions. ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen A smooth particle mesh Ewald method J. Chem. Phys. 103 (1995) pp. 8577-8592 -------- -------- --- Thank You --- -------- -------- Using a Gaussian width (1/beta) of 0.288146 nm for Ewald Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Ewald -1.111e-05 Initialized non-bonded Ewald correction tables, spacing: 8.85e-04 size: 1018 Long Range LJ corr.: <C6> 3.3851e-04 Generated table with 1024 data points for Ewald. Tabscale = 500 points/nm Generated table with 1024 data points for LJ6. Tabscale = 500 points/nm Generated table with 1024 data points for LJ12. Tabscale = 500 points/nm Generated table with 1024 data points for 1-4 COUL. Tabscale = 500 points/nm Generated table with 1024 data points for 1-4 LJ6. Tabscale = 500 points/nm Generated table with 1024 data points for 1-4 LJ12. Tabscale = 500 points/nm Using GPU 8x8 nonbonded short-range kernels Using a dual 8x4 pair-list setup updated with dynamic, rolling pruning: outer list: updated every 80 steps, buffer 0.149 nm, rlist 1.049 nm inner list: updated every 10 steps, buffer 0.003 nm, rlist 0.903 nm At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be: outer list: updated every 80 steps, buffer 0.292 nm, rlist 1.192 nm inner list: updated every 10 steps, buffer 0.043 nm, rlist 0.943 nm Using Lorentz-Berthelot Lennard-Jones combination rule Initializing LINear Constraint Solver ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije LINCS: A Linear Constraint Solver for molecular simulations J. Comp. Chem. 18 (1997) pp. 1463-1472 -------- -------- --- Thank You --- -------- -------- The number of constraints is 8054 ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ S. Miyamoto and P. A. Kollman SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid Water Models J. Comp. Chem. 13 (1992) pp. 952-962 -------- -------- --- Thank You --- -------- -------- Intra-simulation communication will occur every 20 steps. Center of mass motion removal mode is Linear We have the following groups for center of mass motion removal: 0: rest ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ G. Bussi, D. Donadio and M. Parrinello Canonical sampling through velocity rescaling J. Chem. Phys. 126 (2007) pp. 014101 -------- -------- --- Thank You --- -------- -------- There are: 78646 Atoms Started mdrun on rank 0 Tue Nov 13 16:16:25 2018 Step Time 0 0.00000 ------------------------------------------------------- Program: gmx mdrun, version 2018.3 Source file: src/gromacs/gpu_utils/cudautils.cu (line 110) Fatal error: HtoD cudaMemcpyAsync failed: invalid argument For more information and tips for troubleshooting, please check the GROMACS website at http://www.gromacs.org/Documentation/Errors ------------------------------------------------------- Thank you, Dmytro ________________________________________ From: gromacs.org_gmx-users-boun...@maillist.sys.kth.se <gromacs.org_gmx-users-boun...@maillist.sys.kth.se> on behalf of Mark Abraham <mark.j.abra...@gmail.com> Sent: Tuesday, November 13, 2018 10:29 PM To: gmx-us...@gromacs.org Cc: gromacs.org_gmx-users@maillist.sys.kth.se Subject: Re: [gmx-users] Running GPU issue Hi, It can share. Mark On Mon, Nov 12, 2018 at 10:19 PM Kovalskyy, Dmytro <kovals...@uthscsa.edu> wrote: > Hi, > > > > To perform GPU with Gromacs does it require exclusive GPU card or Gromacs > can share the video card with X-server? > > > Thank you > > > Dmytro > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.