Some results (probably suboptimal) for d.poly-ch2 on a desktop running Fedora 28 and using Gromacs-Opencl from Fedora repositories:

Log file opened on Mon Sep 10 21:00:25 2018
Host: mikihir  pid: 32669  rank ID: 0  number of ranks:  1
                      :-) GROMACS - gmx mdrun, 2018.2 (-:

                            GROMACS is written by:
     Emile Apol      Rossen Apostolov      Paul Bauer     Herman J.C. Berendsen
    Par Bjelkmar    Aldert van Buuren   Rudi van Drunen     Anton Feenstra
  Gerrit Groenhof    Aleksei Iupinov   Christoph Junghans   Anca Hamuraru
 Vincent Hindriksen Dimitrios Karkoulis    Peter Kasson        Jiri Kraus
  Carsten Kutzner      Per Larsson      Justin A. Lemkul    Viveca Lindahl
  Magnus Lundborg   Pieter Meulenhoff    Erik Marklund      Teemu Murtola
    Szilard Pall       Sander Pronk      Roland Schulz     Alexey Shvetsov
   Michael Shirts     Alfons Sijbers     Peter Tieleman    Teemu Virolainen
 Christian Wennberg    Maarten Wolf
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2017, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2018.2
Executable:   /usr/bin/gmx
Data prefix:  /usr
Working dir:  /home/benson/Projects/GromacsBench/d.poly-ch2
Command line:
  gmx mdrun

GROMACS version:    2018.2
Precision:          single
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        OpenCL
SIMD instructions:  SSE2
FFT library:        fftw-3.3.5-sse2-avx
RDTSCP usage:       disabled
TNG support:        enabled
Hwloc support:      hwloc-1.11.6
Tracing support:    disabled
Built on:           2018-07-19 19:45:21
Built by:           mockbuild@ [CMAKE]
Build OS/arch:      Linux 4.17.3-200.fc28.x86_64 x86_64
Build CPU vendor:   Intel
Build CPU brand:    Intel Core Processor (Haswell, no TSX)
Build CPU family:   6   Model: 60   Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma intel lahf mmx msr pcid pclmuldq popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler:         /usr/bin/cc GNU 8.1.1
C compiler flags:    -msse2   -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection  -DNDEBUG -funroll-all-loops -fexcess-precision=fast
C++ compiler:       /usr/bin/c++ GNU 8.1.1
C++ compiler flags:  -msse2   -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -std=c++11   -DNDEBUG -funroll-all-loops -fexcess-precision=fast
OpenCL include dir: /usr/include
OpenCL library:     /usr/lib64/libOpenCL.so
OpenCL version:     2.0


Running on 1 node with total 8 cores, 8 logical cores, 1 compatible GPU
Hardware detected:
  CPU info:
    Vendor: AMD
    Brand:  AMD FX(tm)-8350 Eight-Core Processor
    Family: 21   Model: 2   Stepping: 0
    Features: aes amd apic avx clfsh cmov cx8 cx16 f16c fma fma4 htt lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdtscp sse2 sse3 sse4a sse4.1 sse4.2 ssse3 xop
  Hardware topology: Full, with devices
    Sockets, cores, and logical processors:
      Socket  0: [   0] [   1] [   2] [   3] [   4] [   5] [   6] [   7]
    Numa nodes:
      Node  0 (16714620928 bytes mem):   0   1   2   3   4   5   6 7
      Latency:
               0
         0  1.00
    Caches:
      L1: 16384 bytes, linesize 64 bytes, assoc. 4, shared 1 ways
      L2: 2097152 bytes, linesize 64 bytes, assoc. 16, shared 2 ways
      L3: 8388608 bytes, linesize 64 bytes, assoc. 64, shared 8 ways
    PCI devices:
      0000:01:00.0  Id: 1002:67ef  Class: 0x0300  Numa: 0
      0000:02:00.0  Id: 10ec:8168  Class: 0x0200  Numa: 0
      0000:00:11.0  Id: 1002:4391  Class: 0x0106  Numa: 0
  GPU info:
    Number of GPUs detected: 1
    #0: name: Radeon RX 560 Series (POLARIS11 / DRM 3.23.0 / 4.16.3-301.fc28.x86_64, LLVM 6.0.0), vendor: AMD, device version: OpenCL 1.1 Mesa 18.0.5, stat: compatible

Highest SIMD level requested by all nodes in run: AVX_128_FMA
SIMD instructions selected at compile time:       SSE2
This program was compiled for different hardware than you are running on,
which could influence performance.
The current CPU can measure timings more accurately than the code in
gmx mdrun was configured to use. This might affect your simulation
speed as accurate timings are needed for load-balancing.
Please consider rebuilding gmx mdrun with the GMX_USE_RDTSCP=ON CMake option.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

Input Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.001
   nsteps                         = 5000
   init-step                      = 0
   simulation-part                = 1
   comm-mode                      = Linear
   nstcomm                        = 100
   bd-fric                        = 0
   ld-seed                        = -191216883
   emtol                          = 10
   emstep                         = 0.01
   niter                          = 20
   fcstep                         = 0
   nstcgsteep                     = 1000
   nbfgscorr                      = 10
   rtpi                           = 0.05
   nstxout                        = 0
   nstvout                        = 0
   nstfout                        = 0
   nstlog                         = 0
   nstcalcenergy                  = 100
   nstenergy                      = 0
   nstxout-compressed             = 0
   compressed-x-precision         = 1000
   cutoff-scheme                  = Verlet
   nstlist                        = 20
   ns-type                        = Grid
   pbc                            = xyz
   periodic-molecules             = false
   verlet-buffer-tolerance        = 0.005
   rlist                          = 0.9
   coulombtype                    = Cut-off
   coulomb-modifier               = Potential-shift
   rcoulomb-switch                = 0
   rcoulomb                       = 0.9
   epsilon-r                      = 1
   epsilon-rf                     = inf
   vdw-type                       = Cut-off
   vdw-modifier                   = Potential-shift
   rvdw-switch                    = 0
   rvdw                           = 0.9
   DispCorr                       = No
   table-extension                = 1
   fourierspacing                 = 0.12
   fourier-nx                     = 0
   fourier-ny                     = 0
   fourier-nz                     = 0
   pme-order                      = 4
   ewald-rtol                     = 1e-05
   ewald-rtol-lj                  = 0.001
   lj-pme-comb-rule               = Geometric
   ewald-geometry                 = 0
   epsilon-surface                = 0
   implicit-solvent               = No
   gb-algorithm                   = Still
   nstgbradii                     = 1
   rgbradii                       = 1
   gb-epsilon-solvent             = 80
   gb-saltconc                    = 0
   gb-obc-alpha                   = 1
   gb-obc-beta                    = 0.8
   gb-obc-gamma                   = 4.85
   gb-dielectric-offset           = 0.009
   sa-algorithm                   = Ace-approximation
   sa-surface-tension             = 2.05016
   tcoupl                         = Berendsen
   nsttcouple                     = 20
   nh-chain-length                = 0
   print-nose-hoover-chain-variables = false
   pcoupl                         = No
   pcoupltype                     = Isotropic
   nstpcouple                     = -1
   tau-p                          = 1
   compressibility (3x3):
      compressibility[    0]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
      compressibility[    1]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
      compressibility[    2]={ 0.00000e+00,  0.00000e+00, 0.00000e+00}
   ref-p (3x3):
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   refcoord-scaling               = No
   posres-com (3):
      posres-com[0]= 0.00000e+00
      posres-com[1]= 0.00000e+00
      posres-com[2]= 0.00000e+00
   posres-comB (3):
      posres-comB[0]= 0.00000e+00
      posres-comB[1]= 0.00000e+00
      posres-comB[2]= 0.00000e+00
   QMMM                           = false
   QMconstraints                  = 0
   QMMMscheme                     = 0
   MMChargeScaleFactor            = 1
qm-opts:
   ngQM                           = 0
   constraint-algorithm           = Lincs
   continuation                   = false
   Shake-SOR                      = false
   shake-tol                      = 0.0001
   lincs-order                    = 4
   lincs-iter                     = 1
   lincs-warnangle                = 30
   nwall                          = 0
   wall-type                      = 9-3
   wall-r-linpot                  = -1
   wall-atomtype[0]               = -1
   wall-atomtype[1]               = -1
   wall-density[0]                = 0
   wall-density[1]                = 0
   wall-ewald-zfac                = 3
   pull                           = false
   awh                            = false
   rotation                       = false
   interactiveMD                  = false
   disre                          = No
   disre-weighting                = Conservative
   disre-mixed                    = false
   dr-fc                          = 1000
   dr-tau                         = 0
   nstdisreout                    = 100
   orire-fc                       = 0
   orire-tau                      = 0
   nstorireout                    = 100
   free-energy                    = no
   cos-acceleration               = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   simulated-tempering            = false
   swapcoords                     = no
   userint1                       = 0
   userint2                       = 0
   userint3                       = 0
   userint4                       = 0
   userreal1                      = 0
   userreal2                      = 0
   userreal3                      = 0
   userreal4                      = 0
   applied-forces:
     electric-field:
       x:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       y:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       z:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
grpopts:
   nrdf:       17997
   ref-t:         300
   tau-t:         0.1
annealing:          No
annealing-npoints:           0
   acc:               0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

Changing nstlist from 20 to 100, rlist from 0.9 to 0.905


Using 1 MPI thread
Using 8 OpenMP threads

1 GPU auto-selected for this run.
Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
  PP:0
Pinning threads with an auto-selected logical core stride of 1
System total charge: 0.000
Potential shift: LJ r^-12: -3.541e+00 r^-6: -1.882e+00, Coulomb -1e+00

Using GPU 8x8 nonbonded short-range kernels

Using a 8x4 pair-list setup:
  updated every 100 steps, buffer 0.005 nm, rlist 0.905 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
  updated every 100 steps, buffer 0.076 nm, rlist 0.976 nm

Using geometric Lennard-Jones combination rule

Removing pbc first time

Intra-simulation communication will occur every 20 steps.
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------

There are: 6000 Atoms
There are: 6000 VSites
Initial temperature: 450.358 K

Started mdrun on rank 0 Mon Sep 10 21:00:27 2018
           Step           Time
              0        0.00000

   Energies (kJ/mol)
           Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
    1.10780e+04    1.13402e+04    1.88807e+04   -2.19619e+04 0.00000e+00
      Potential    Kinetic En.   Total Energy  Conserved En. Temperature
    1.93369e+04    3.36615e+04    5.29983e+04    5.29983e+04 4.49913e+02
 Pressure (bar)
    8.20510e+02

           Step           Time
           5000        5.00000

Writing checkpoint, step 5000 at Mon Sep 10 21:04:37 2018


   Energies (kJ/mol)
           Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
    7.30979e+03    7.57440e+03    1.48801e+04   -2.30979e+04 0.00000e+00
      Potential    Kinetic En.   Total Energy  Conserved En. Temperature
    6.66641e+03    2.25799e+04    2.92463e+04    5.28503e+04 3.01799e+02
 Pressure (bar)
   -8.06942e+01

    <======  ###############  ==>
    <====  A V E R A G E S  ====>
    <==  ###############  ======>

    Statistics over 5001 steps using 51 frames

   Energies (kJ/mol)
           Bond          Angle Ryckaert-Bell.        LJ (SR) Coulomb (SR)
    7.59408e+03    7.81450e+03    1.51294e+04   -2.29783e+04 0.00000e+00
      Potential    Kinetic En.   Total Energy  Conserved En. Temperature
    7.55967e+03    2.30250e+04    3.05847e+04    5.29245e+04 3.07748e+02
 Pressure (bar)
    2.63622e+01

   Total Virial (kJ/mol)
    7.74123e+03    2.93639e+02    1.13344e+02
    2.93639e+02    7.68271e+03   -3.40627e+02
    1.13345e+02   -3.40625e+02    7.17150e+03

   Pressure (bar)
   -1.13044e+01   -5.10385e+01   -1.83614e+01
   -5.10385e+01   -2.53181e+00    6.71371e+01
   -1.83616e+01    6.71366e+01    9.29227e+01


    M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops % Flops
-----------------------------------------------------------------------------
 Pair Search distance check              27.160704 244.446     0.0
 NxN RF Elec. + LJ [F]                24321.496320 924216.860    96.8
 NxN RF Elec. + LJ [V&F]                250.586880 13531.692     1.4
 Shift-X                                  0.612000 3.672     0.0
 Bonds                                   30.000999 1770.059     0.2
 Angles                                  29.995998 5039.328     0.5
 RB-Dihedrals                            29.990997 7407.776     0.8
 Virial                                   0.614295 11.057     0.0
 Stop-CM                                  0.624000 6.240     0.0
 Calc-Ekin                                6.024000 162.648     0.0
 Virtual Site 3fd                        29.995998 2849.620     0.3
 Virtual Site 3fad                        0.010002 1.760     0.0
-----------------------------------------------------------------------------
 Total                                                  955245.158 100.0
-----------------------------------------------------------------------------


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 1 MPI rank, each using 8 OpenMP threads

 Computing:          Num   Num      Call    Wall time Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Vsite constr.          1    8       5001      19.000 610.049   7.6
 Neighbor search        1    8         51       0.878 28.190   0.4
 Launch GPU ops.        1    8       5001      13.524 434.216   5.4
 Force                  1    8       5001      88.859 2853.066  35.5
 Wait GPU NB local      1    8       5001       1.060 34.044   0.4
 NB X/F buffer ops.     1    8       9951      41.072 1318.714  16.4
 Vsite spread           1    8       5001      38.567 1238.308  15.4
 Write traj.            1    8          1       0.062 1.999   0.0
 Update                 1    8       5001      44.615 1432.481  17.8
 Rest                                           2.560 82.197   1.0
-----------------------------------------------------------------------------
 Total                                        250.198       8033.266 100.0
-----------------------------------------------------------------------------

 GPU timings
-----------------------------------------------------------------------------
 Computing:                         Count  Wall t (s) ms/step       %
-----------------------------------------------------------------------------
 Pair list H2D                         51       0.001 0.024     0.0
 X / q H2D                           5001       0.029 0.006     0.3
 Nonbonded F kernel                  4950       8.437 1.704    77.8
 Nonbonded F+ene+prune k.              51       0.213 4.167     2.0
 F D2H                               5001       2.171 0.434    20.0
-----------------------------------------------------------------------------
 Total                                         10.851        2.170 100.0
-----------------------------------------------------------------------------

Average per-step force GPU/CPU evaluation time ratio: 2.170 ms/17.768 ms = 0.122

               Core t (s)   Wall t (s)        (%)
       Time:     2001.585      250.198      800.0
                 (ns/day)    (hour/ns)
Performance:        1.727       13.897
Finished mdrun on rank 0 Mon Sep 10 21:04:37 2018

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to