Sadly, I can't recommend packaged versions of GROMACS for anything other than pre- or post-processing or non-performance critical work; these are compiled with proper SIMD support which is generally wasteful.
Also, I can't (yet) recommend AMD GPUs as a buying option for consumer-grade stuff as we don't yet have PME offload support in OpenCL, but this will soon change. Additionally and importantly, I can't recommend the MESA stack, it's just not competitive in performance. Use ROCm (or AMDGPU-PRO). -- Szilárd On Mon, Sep 10, 2018 at 8:21 PM Benson Muite <benson.mu...@ut.ee> wrote: > Some results (probably suboptimal) for d.poly-ch2 on a desktop running > Fedora 28 and using Gromacs-Opencl from Fedora repositories: > > Log file opened on Mon Sep 10 21:00:25 2018 > Host: mikihir pid: 32669 rank ID: 0 number of ranks: 1 > :-) GROMACS - gmx mdrun, 2018.2 (-: > > GROMACS is written by: > Emile Apol Rossen Apostolov Paul Bauer Herman J.C. > Berendsen > Par Bjelkmar Aldert van Buuren Rudi van Drunen Anton Feenstra > Gerrit Groenhof Aleksei Iupinov Christoph Junghans Anca Hamuraru > Vincent Hindriksen Dimitrios Karkoulis Peter Kasson Jiri Kraus > Carsten Kutzner Per Larsson Justin A. Lemkul Viveca Lindahl > Magnus Lundborg Pieter Meulenhoff Erik Marklund Teemu Murtola > Szilard Pall Sander Pronk Roland Schulz Alexey Shvetsov > Michael Shirts Alfons Sijbers Peter Tieleman Teemu > Virolainen > Christian Wennberg Maarten Wolf > and the project leaders: > Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel > > Copyright (c) 1991-2000, University of Groningen, The Netherlands. > Copyright (c) 2001-2017, The GROMACS development team at > Uppsala University, Stockholm University and > the Royal Institute of Technology, Sweden. > check out http://www.gromacs.org for more information. > > GROMACS is free software; you can redistribute it and/or modify it > under the terms of the GNU Lesser General Public License > as published by the Free Software Foundation; either version 2.1 > of the License, or (at your option) any later version. > > GROMACS: gmx mdrun, version 2018.2 > Executable: /usr/bin/gmx > Data prefix: /usr > Working dir: /home/benson/Projects/GromacsBench/d.poly-ch2 > Command line: > gmx mdrun > > GROMACS version: 2018.2 > Precision: single > Memory model: 64 bit > MPI library: thread_mpi > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) > GPU support: OpenCL > SIMD instructions: SSE2 > FFT library: fftw-3.3.5-sse2-avx > RDTSCP usage: disabled > TNG support: enabled > Hwloc support: hwloc-1.11.6 > Tracing support: disabled > Built on: 2018-07-19 19:45:21 > Built by: mockbuild@ [CMAKE] > Build OS/arch: Linux 4.17.3-200.fc28.x86_64 x86_64 > Build CPU vendor: Intel > Build CPU brand: Intel Core Processor (Haswell, no TSX) > Build CPU family: 6 Model: 60 Stepping: 1 > Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma intel > lahf mmx msr pcid pclmuldq popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 > sse4.2 ssse3 tdt x2apic > C compiler: /usr/bin/cc GNU 8.1.1 > C compiler flags: -msse2 -O2 -g -pipe -Wall -Werror=format-security > -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions > -fstack-protector-strong -grecord-gcc-switches > -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 > -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic > -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection > -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 > -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong > -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 > -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic > -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection > -DNDEBUG -funroll-all-loops -fexcess-precision=fast > C++ compiler: /usr/bin/c++ GNU 8.1.1 > C++ compiler flags: -msse2 -O2 -g -pipe -Wall -Werror=format-security > -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions > -fstack-protector-strong -grecord-gcc-switches > -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 > -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic > -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection > -std=c++11 -DNDEBUG -funroll-all-loops -fexcess-precision=fast > OpenCL include dir: /usr/include > OpenCL library: /usr/lib64/libOpenCL.so > OpenCL version: 2.0 > > > Running on 1 node with total 8 cores, 8 logical cores, 1 compatible GPU > Hardware detected: > CPU info: > Vendor: AMD > Brand: AMD FX(tm)-8350 Eight-Core Processor > Family: 21 Model: 2 Stepping: 0 > Features: aes amd apic avx clfsh cmov cx8 cx16 f16c fma fma4 htt > lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp > sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop > Hardware topology: Full, with devices > Sockets, cores, and logical processors: > Socket 0: [ 0] [ 1] [ 2] [ 3] [ 4] [ 5] [ 6] [ 7] > Numa nodes: > Node 0 (16714620928 bytes mem): 0 1 2 3 4 5 6 7 > Latency: > 0 > 0 1.00 > Caches: > L1: 16384 bytes, linesize 64 bytes, assoc. 4, shared 1 ways > L2: 2097152 bytes, linesize 64 bytes, assoc. 16, shared 2 ways > L3: 8388608 bytes, linesize 64 bytes, assoc. 64, shared 8 ways > PCI devices: > 0000:01:00.0 Id: 1002:67ef Class: 0x0300 Numa: 0 > 0000:02:00.0 Id: 10ec:8168 Class: 0x0200 Numa: 0 > 0000:00:11.0 Id: 1002:4391 Class: 0x0106 Numa: 0 > GPU info: > Number of GPUs detected: 1 > #0: name: Radeon RX 560 Series (POLARIS11 / DRM 3.23.0 / > 4.16.3-301.fc28.x86_64, LLVM 6.0.0), vendor: AMD, device version: OpenCL > 1.1 Mesa 18.0.5, stat: compatible > > Highest SIMD level requested by all nodes in run: AVX_128_FMA > SIMD instructions selected at compile time: SSE2 > This program was compiled for different hardware than you are running on, > which could influence performance. > The current CPU can measure timings more accurately than the code in > gmx mdrun was configured to use. This might affect your simulation > speed as accurate timings are needed for load-balancing. > Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake > option. > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E. > Lindahl > GROMACS: High performance molecular simulations through multi-level > parallelism from laptops to supercomputers > SoftwareX 1 (2015) pp. 19-25 > -------- -------- --- Thank You --- -------- -------- > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl > Tackling Exascale Software Challenges in Molecular Dynamics Simulations > with > GROMACS > In S. Markidis & E. Laure (Eds.), Solving Software Challenges for > Exascale 8759 (2015) pp. 3-27 > -------- -------- --- Thank You --- -------- -------- > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. > Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. > Lindahl > GROMACS 4.5: a high-throughput and highly parallel open source molecular > simulation toolkit > Bioinformatics 29 (2013) pp. 845-54 > -------- -------- --- Thank You --- -------- -------- > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl > GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable > molecular simulation > J. Chem. Theory Comput. 4 (2008) pp. 435-447 > -------- -------- --- Thank You --- -------- -------- > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C. > Berendsen > GROMACS: Fast, Flexible and Free > J. Comp. Chem. 26 (2005) pp. 1701-1719 > -------- -------- --- Thank You --- -------- -------- > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > E. Lindahl and B. Hess and D. van der Spoel > GROMACS 3.0: A package for molecular simulation and trajectory analysis > J. Mol. Mod. 7 (2001) pp. 306-317 > -------- -------- --- Thank You --- -------- -------- > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > H. J. C. Berendsen, D. van der Spoel and R. van Drunen > GROMACS: A message-passing parallel molecular dynamics implementation > Comp. Phys. Comm. 91 (1995) pp. 43-56 > -------- -------- --- Thank You --- -------- -------- > > Input Parameters: > integrator = md > tinit = 0 > dt = 0.001 > nsteps = 5000 > init-step = 0 > simulation-part = 1 > comm-mode = Linear > nstcomm = 100 > bd-fric = 0 > ld-seed = -191216883 > emtol = 10 > emstep = 0.01 > niter = 20 > fcstep = 0 > nstcgsteep = 1000 > nbfgscorr = 10 > rtpi = 0.05 > nstxout = 0 > nstvout = 0 > nstfout = 0 > nstlog = 0 > nstcalcenergy = 100 > nstenergy = 0 > nstxout-compressed = 0 > compressed-x-precision = 1000 > cutoff-scheme = Verlet > nstlist = 20 > ns-type = Grid > pbc = xyz > periodic-molecules = false > verlet-buffer-tolerance = 0.005 > rlist = 0.9 > coulombtype = Cut-off > coulomb-modifier = Potential-shift > rcoulomb-switch = 0 > rcoulomb = 0.9 > epsilon-r = 1 > epsilon-rf = inf > vdw-type = Cut-off > vdw-modifier = Potential-shift > rvdw-switch = 0 > rvdw = 0.9 > DispCorr = No > table-extension = 1 > fourierspacing = 0.12 > fourier-nx = 0 > fourier-ny = 0 > fourier-nz = 0 > pme-order = 4 > ewald-rtol = 1e-05 > ewald-rtol-lj = 0.001 > lj-pme-comb-rule = Geometric > ewald-geometry = 0 > epsilon-surface = 0 > implicit-solvent = No > gb-algorithm = Still > nstgbradii = 1 > rgbradii = 1 > gb-epsilon-solvent = 80 > gb-saltconc = 0 > gb-obc-alpha = 1 > gb-obc-beta = 0.8 > gb-obc-gamma = 4.85 > gb-dielectric-offset = 0.009 > sa-algorithm = Ace-approximation > sa-surface-tension = 2.05016 > tcoupl = Berendsen > nsttcouple = 20 > nh-chain-length = 0 > print-nose-hoover-chain-variables = false > pcoupl = No > pcoupltype = Isotropic > nstpcouple = -1 > tau-p = 1 > compressibility (3x3): > compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > ref-p (3x3): > ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > refcoord-scaling = No > posres-com (3): > posres-com[0]= 0.00000e+00 > posres-com[1]= 0.00000e+00 > posres-com[2]= 0.00000e+00 > posres-comB (3): > posres-comB[0]= 0.00000e+00 > posres-comB[1]= 0.00000e+00 > posres-comB[2]= 0.00000e+00 > QMMM = false > QMconstraints = 0 > QMMMscheme = 0 > MMChargeScaleFactor = 1 > qm-opts: > ngQM = 0 > constraint-algorithm = Lincs > continuation = false > Shake-SOR = false > shake-tol = 0.0001 > lincs-order = 4 > lincs-iter = 1 > lincs-warnangle = 30 > nwall = 0 > wall-type = 9-3 > wall-r-linpot = -1 > wall-atomtype[0] = -1 > wall-atomtype[1] = -1 > wall-density[0] = 0 > wall-density[1] = 0 > wall-ewald-zfac = 3 > pull = false > awh = false > rotation = false > interactiveMD = false > disre = No > disre-weighting = Conservative > disre-mixed = false > dr-fc = 1000 > dr-tau = 0 > nstdisreout = 100 > orire-fc = 0 > orire-tau = 0 > nstorireout = 100 > free-energy = no > cos-acceleration = 0 > deform (3x3): > deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > simulated-tempering = false > swapcoords = no > userint1 = 0 > userint2 = 0 > userint3 = 0 > userint4 = 0 > userreal1 = 0 > userreal2 = 0 > userreal3 = 0 > userreal4 = 0 > applied-forces: > electric-field: > x: > E0 = 0 > omega = 0 > t0 = 0 > sigma = 0 > y: > E0 = 0 > omega = 0 > t0 = 0 > sigma = 0 > z: > E0 = 0 > omega = 0 > t0 = 0 > sigma = 0 > grpopts: > nrdf: 17997 > ref-t: 300 > tau-t: 0.1 > annealing: No > annealing-npoints: 0 > acc: 0 0 0 > nfreeze: N N N > energygrp-flags[ 0]: 0 > > Changing nstlist from 20 to 100, rlist from 0.9 to 0.905 > > > Using 1 MPI thread > Using 8 OpenMP threads > > 1 GPU auto-selected for this run. > Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node: > PP:0 > Pinning threads with an auto-selected logical core stride of 1 > System total charge: 0.000 > Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Coulomb -1e+00 > > Using GPU 8x8 nonbonded short-range kernels > > Using a 8x4 pair-list setup: > updated every 100 steps, buffer 0.005 nm, rlist 0.905 nm > At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list > would be: > updated every 100 steps, buffer 0.076 nm, rlist 0.976 nm > > Using geometric Lennard-Jones combination rule > > Removing pbc first time > > Intra-simulation communication will occur every 20 steps. > Center of mass motion removal mode is Linear > We have the following groups for center of mass motion removal: > 0: rest > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak > Molecular dynamics with coupling to an external bath > J. Chem. Phys. 81 (1984) pp. 3684-3690 > -------- -------- --- Thank You --- -------- -------- > > There are: 6000 Atoms > There are: 6000 VSites > Initial temperature: 450.358 K > > Started mdrun on rank 0 Mon Sep 10 21:00:27 2018 > Step Time > 0 0.00000 > > Energies (kJ/mol) > Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR) > 1.10780e+04 1.13402e+04 1.88807e+04 -2.19619e+04 0.00000e+00 > Potential Kinetic En. Total Energy Conserved En. Temperature > 1.93369e+04 3.36615e+04 5.29983e+04 5.29983e+04 4.49913e+02 > Pressure (bar) > 8.20510e+02 > > Step Time > 5000 5.00000 > > Writing checkpoint, step 5000 at Mon Sep 10 21:04:37 2018 > > > Energies (kJ/mol) > Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR) > 7.30979e+03 7.57440e+03 1.48801e+04 -2.30979e+04 0.00000e+00 > Potential Kinetic En. Total Energy Conserved En. Temperature > 6.66641e+03 2.25799e+04 2.92463e+04 5.28503e+04 3.01799e+02 > Pressure (bar) > -8.06942e+01 > > <====== ############### ==> > <==== A V E R A G E S ====> > <== ############### ======> > > Statistics over 5001 steps using 51 frames > > Energies (kJ/mol) > Bond Angle Ryckaert-Bell. LJ (SR) Coulomb (SR) > 7.59408e+03 7.81450e+03 1.51294e+04 -2.29783e+04 0.00000e+00 > Potential Kinetic En. Total Energy Conserved En. Temperature > 7.55967e+03 2.30250e+04 3.05847e+04 5.29245e+04 3.07748e+02 > Pressure (bar) > 2.63622e+01 > > Total Virial (kJ/mol) > 7.74123e+03 2.93639e+02 1.13344e+02 > 2.93639e+02 7.68271e+03 -3.40627e+02 > 1.13345e+02 -3.40625e+02 7.17150e+03 > > Pressure (bar) > -1.13044e+01 -5.10385e+01 -1.83614e+01 > -5.10385e+01 -2.53181e+00 6.71371e+01 > -1.83616e+01 6.71366e+01 9.29227e+01 > > > M E G A - F L O P S A C C O U N T I N G > > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table > W3=SPC/TIP3p W4=TIP4p (single or pairs) > V&F=Potential and force V=Potential only F=Force only > > Computing: M-Number M-Flops % Flops > > ----------------------------------------------------------------------------- > Pair Search distance check 27.160704 244.446 0.0 > NxN RF Elec. + LJ [F] 24321.496320 924216.860 96.8 > NxN RF Elec. + LJ [V&F] 250.586880 13531.692 1.4 > Shift-X 0.612000 3.672 0.0 > Bonds 30.000999 1770.059 0.2 > Angles 29.995998 5039.328 0.5 > RB-Dihedrals 29.990997 7407.776 0.8 > Virial 0.614295 11.057 0.0 > Stop-CM 0.624000 6.240 0.0 > Calc-Ekin 6.024000 162.648 0.0 > Virtual Site 3fd 29.995998 2849.620 0.3 > Virtual Site 3fad 0.010002 1.760 0.0 > > ----------------------------------------------------------------------------- > Total 955245.158 100.0 > > ----------------------------------------------------------------------------- > > > R E A L C Y C L E A N D T I M E A C C O U N T I N G > > On 1 MPI rank, each using 8 OpenMP threads > > Computing: Num Num Call Wall time Giga-Cycles > Ranks Threads Count (s) total sum % > > ----------------------------------------------------------------------------- > Vsite constr. 1 8 5001 19.000 610.049 7.6 > Neighbor search 1 8 51 0.878 28.190 0.4 > Launch GPU ops. 1 8 5001 13.524 434.216 5.4 > Force 1 8 5001 88.859 2853.066 35.5 > Wait GPU NB local 1 8 5001 1.060 34.044 0.4 > NB X/F buffer ops. 1 8 9951 41.072 1318.714 16.4 > Vsite spread 1 8 5001 38.567 1238.308 15.4 > Write traj. 1 8 1 0.062 1.999 0.0 > Update 1 8 5001 44.615 1432.481 17.8 > Rest 2.560 82.197 1.0 > > ----------------------------------------------------------------------------- > Total 250.198 8033.266 100.0 > > ----------------------------------------------------------------------------- > > GPU timings > > ----------------------------------------------------------------------------- > Computing: Count Wall t (s) ms/step % > > ----------------------------------------------------------------------------- > Pair list H2D 51 0.001 0.024 0.0 > X / q H2D 5001 0.029 0.006 0.3 > Nonbonded F kernel 4950 8.437 1.704 77.8 > Nonbonded F+ene+prune k. 51 0.213 4.167 2.0 > F D2H 5001 2.171 0.434 20.0 > > ----------------------------------------------------------------------------- > Total 10.851 2.170 100.0 > > ----------------------------------------------------------------------------- > > Average per-step force GPU/CPU evaluation time ratio: 2.170 ms/17.768 ms > = 0.122 > > Core t (s) Wall t (s) (%) > Time: 2001.585 250.198 800.0 > (ns/day) (hour/ns) > Performance: 1.727 13.897 > Finished mdrun on rank 0 Mon Sep 10 21:04:37 2018 > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.