Hi Paul,

> On 12. Dec 2018, at 15:36, pbusc...@q.com wrote:
> 
> Dear users  ( one more try ) 
> 
> I am trying to use 2 GPU cards to improve modeling speed.  The computer 
> described in the log files is used  to iron out models and am using to learn 
> how to use two GPU cards before purchasing two new RTX 2080 ti's.  The CPU is 
> a 8 core 16 thread AMD and the GPU's are two GTX 1060; there are 50000 atoms 
> in the model
> 
> Using ntpmi and ntomp  settings of 1: 16,  auto  ( 4:4) and  2: 8 ( and any 
> other combination factoring to 16)  the rating for ns/day are approx.   12-16 
>  and  for any other setting ~6-8  i.e adding a card cuts efficiency by half.  
> The average load imbalance is less than 3.4% for the multicard setup .
> 
> I am not at this point trying to maximize efficiency, but only to show some 
> improvement going from one to two cards.   According to a 2015 paper form the 
> Gromacs group  “ Best bang for your buck: GPU nodes for GROMACS biomolecular 
> simulations “  I should expect maybe (at best )  50% improvement for 90k 
> atoms ( with  2x  GTX 970 )
We did not benchmark GTX 970 in that publication.

But from Table 6 you can see that we also had quite a few cases with out 80k 
benchmark
where going from 1 to 2 GPUs, simulation speed did not increase much: E.g. for 
the
E5-2670v2 going from one to 2 GTX 980 GPUs led to an increase of 10 percent.

Did you use counter resetting for the benchnarks?

Carsten


> What bothers me in my initial attempts is that my simulations became slower 
> by adding the second GPU - it is frustrating to say the least. It's like 
> swimming backwards.
> 
> I know am missing - as a minimum -  the correct setup for mdrun and 
> suggestions would be welcome
> 
> The output from the last section of the log files is included below.
> 
> =========================== ntpmi  1  ntomp:16 ==============================
> 
>       <======  ###############  ==>
>       <====  A V E R A G E S  ====>
>       <==  ###############  ======>
> 
>       Statistics over 29301 steps using 294 frames
> 
>   Energies (kJ/mol)
>          Angle       G96Angle    Proper Dih.  Improper Dih.          LJ-14
>    9.17533e+05    2.27874e+04    6.64128e+04    2.31214e+02    8.34971e+04
>     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.
>   -2.84567e+07   -1.43385e+05   -2.04658e+03    1.33320e+07    1.59914e+05
> Position Rest.      Potential    Kinetic En.   Total Energy    Temperature
>    7.79893e+01   -1.40196e+07    1.88467e+05   -1.38312e+07    3.00376e+02
> Pres. DC (bar) Pressure (bar)   Constr. rmsd
>   -2.88685e+00    3.75436e+01    0.00000e+00
> 
>   Total Virial (kJ/mol)
>    5.27555e+04   -4.87626e+02    1.86144e+02
>   -4.87648e+02    4.04479e+04   -1.91959e+02
>    1.86177e+02   -1.91957e+02    5.45671e+04
> 
>   Pressure (bar)
>    2.22202e+01    1.27887e+00   -4.71738e-01
>    1.27893e+00    6.48135e+01    5.12638e-01
>   -4.71830e-01    5.12632e-01    2.55971e+01
> 
>         T-PDMS         T-VMOS
>    2.99822e+02    3.32834e+02
> 
> 
>       M E G A - F L O P S   A C C O U N T I N G
> 
> NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
> W3=SPC/TIP3p  W4=TIP4p (single or pairs)
> V&F=Potential and force  V=Potential only  F=Force only
> 
> Computing:                               M-Number         M-Flops  % Flops
> -----------------------------------------------------------------------------
> Pair Search distance check            2349.753264       21147.779     0.0
> NxN Ewald Elec. + LJ [F]           1771584.591744   116924583.055    96.6
> NxN Ewald Elec. + LJ [V&F]           17953.091840     1920980.827     1.6
> 1,4 nonbonded interactions            5278.575150      475071.763     0.4
> Shift-X                                 22.173480         133.041     0.0
> Angles                                4178.908620      702056.648     0.6
> Propers                                879.909030      201499.168     0.2
> Impropers                                5.274180        1097.029     0.0
> Pos. Restr.                             42.193440        2109.672     0.0
> Virial                                  22.186710         399.361     0.0
> Update                                2209.881420       68506.324     0.1
> Stop-CM                                 22.248900         222.489     0.0
> Calc-Ekin                               44.346960        1197.368     0.0
> Lincs                                 4414.639320      264878.359     0.2
> Lincs-Mat                           100297.229760      401188.919     0.3
> Constraint-V                          8829.127980       70633.024     0.1
> Constraint-Vir                          22.147020         531.528     0.0
> -----------------------------------------------------------------------------
> Total                                               121056236.355   100.0
> -----------------------------------------------------------------------------
>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> On 1 MPI rank, each using 16 OpenMP threads
> 
> Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                     Ranks Threads  Count      (s)         total sum    %
> -----------------------------------------------------------------------------
> Neighbor search        1   16        294       2.191        129.485   1.0
> Launch GPU ops.        1   16      58602       4.257        251.544   2.0
> Force                  1   16      29301      23.769       1404.510  11.3
> Wait PME GPU gather    1   16      29301      33.740       1993.695  16.0
> Reduce GPU PME F       1   16      29301       7.244        428.079   3.4
> Wait GPU NB local      1   16      29301      60.054       3548.612  28.5
> NB X/F buffer ops.     1   16      58308       9.823        580.459   4.7
> Write traj.            1   16          7       0.119          7.048   0.1
> Update                 1   16      58602      11.089        655.275   5.3
> Constraints            1   16      58602      40.378       2385.992  19.2
> Rest                                          17.743       1048.462   8.4
> -----------------------------------------------------------------------------
> Total                                        210.408      12433.160 100.0
> -----------------------------------------------------------------------------
> 
>               Core t (s)   Wall t (s)        (%)
>       Time:     3366.529      210.408     1600.0
>                 (ns/day)    (hour/ns)
> Performance:       12.032        1.995
> Finished mdrun on rank 0 Mon Dec 10 17:17:04 2018
> 
> 
> =========================== ntpmi and ntomp   auto  ( 4:4 ) 
> =======================================
> 
> 
>       <======  ###############  ==>
>       <====  A V E R A G E S  ====>
>       <==  ###############  ======>
> 
>       Statistics over 3301 steps using 34 frames
> 
>   Energies (kJ/mol)
>          Angle       G96Angle    Proper Dih.  Improper Dih.          LJ-14
>    9.20586e+05    1.95534e+04    6.56058e+04    2.21093e+02    8.56673e+04
>     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.
>   -2.84553e+07   -1.44595e+05   -2.04658e+03    1.34518e+07    4.26167e+04
> Position Rest.      Potential    Kinetic En.   Total Energy    Temperature
>    3.83653e+01   -1.40159e+07    1.90353e+05   -1.38255e+07    3.03381e+02
> Pres. DC (bar) Pressure (bar)   Constr. rmsd
>   -2.88685e+00    2.72913e+02    0.00000e+00
> 
>   Total Virial (kJ/mol)
>   -5.05948e+04   -3.29107e+03    4.84786e+02
>   -3.29135e+03   -3.42006e+04   -3.32392e+03
>    4.84606e+02   -3.32403e+03   -2.06849e+04
> 
>   Pressure (bar)
>    3.09713e+02    8.98192e+00   -1.19828e+00
>    8.98270e+00    2.73248e+02    8.99543e+00
>   -1.19778e+00    8.99573e+00    2.35776e+02
> 
>         T-PDMS         T-VMOS
>    2.98623e+02    5.82467e+02
> 
> 
>       P P   -   P M E   L O A D   B A L A N C I N G
> 
> NOTE: The PP/PME load balancing was limited by the maximum allowed grid 
> scaling,
>       you might not have reached a good load balance.
> 
> PP/PME load balancing changed the cut-off and PME settings:
>           particle-particle                    PME
>            rcoulomb  rlist            grid      spacing   1/beta
>   initial  1.000 nm  1.000 nm     160 160 128   0.156 nm  0.320 nm
>   final    1.628 nm  1.628 nm      96  96  80   0.260 nm  0.521 nm
> cost-ratio           4.31             0.23
> (note that these numbers concern only part of the total PP and PME load)
> 
> 
>       M E G A - F L O P S   A C C O U N T I N G
> 
> NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
> W3=SPC/TIP3p  W4=TIP4p (single or pairs)
> V&F=Potential and force  V=Potential only  F=Force only
> 
> Computing:                               M-Number         M-Flops  % Flops
> -----------------------------------------------------------------------------
> Pair Search distance check             285.793872        2572.145     0.0
> NxN Ewald Elec. + LJ [F]            367351.034688    24245168.289    92.1
> NxN Ewald Elec. + LJ [V&F]            3841.181056      411006.373     1.6
> 1,4 nonbonded interactions             594.675150       53520.763     0.2
> Calc Weights                           746.884260       26887.833     0.1
> Spread Q Bspline                     15933.530880       31867.062     0.1
> Gather F Bspline                     15933.530880       95601.185     0.4
> 3D-FFT                              154983.295306     1239866.362     4.7
> Solve PME                               40.079616        2565.095     0.0
> Reset In Box                             2.564280           7.693     0.0
> CG-CoM                                   2.639700           7.919     0.0
> Angles                                 470.788620       79092.488     0.3
> Propers                                 99.129030       22700.548     0.1
> Impropers                                0.594180         123.589     0.0
> Pos. Restr.                              4.753440         237.672     0.0
> Virial                                   2.570400          46.267     0.0
> Update                                 248.961420        7717.804     0.0
> Stop-CM                                  2.639700          26.397     0.0
> Calc-Ekin                                5.128560         138.471     0.0
> Lincs                                  557.713246       33462.795     0.1
> Lincs-Mat                            12624.363456       50497.454     0.2
> Constraint-V                          1115.257670        8922.061     0.0
> Constraint-Vir                           2.871389          68.913     0.0
> -----------------------------------------------------------------------------
> Total                                                26312105.181   100.0
> -----------------------------------------------------------------------------
> 
> 
>    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
> 
> av. #atoms communicated per step for force:  2 x 16748.9
> av. #atoms communicated per step for LINCS:  2 x 9361.6
> 
> 
> Dynamic load balancing report:
> DLB was off during the run due to low measured imbalance.
> Average load imbalance: 3.4%.
> The balanceable part of the MD step is 46%, load imbalance is computed from 
> this.
> Part of the total run time spent waiting due to load imbalance: 1.6%.
> 
> 
>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> 
> On 4 MPI ranks, each using 4 OpenMP threads
> 
> Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                     Ranks Threads  Count      (s)         total sum    %
> -----------------------------------------------------------------------------
> Domain decomp.         4    4         34       0.457         26.976   1.0
> DD comm. load          4    4          2       0.000          0.008   0.0
> Neighbor search        4    4         34       0.138          8.160   0.3
> Launch GPU ops.        4    4       6602       0.441         26.070   0.9
> Comm. coord.           4    4       3267       0.577         34.081   1.2
> Force                  4    4       3301       2.298        135.761   4.9
> Wait + Comm. F         4    4       3301       0.276         16.330   0.6
> PME mesh               4    4       3301      25.822       1525.817  54.8
> Wait GPU NB nonloc.    4    4       3301       0.132          7.819   0.3
> Wait GPU NB local      4    4       3301       0.012          0.724   0.0
> NB X/F buffer ops.     4    4      13136       0.471         27.822   1.0
> Write traj.            4    4          2       0.014          0.839   0.0
> Update                 4    4       6602       1.006         59.442   2.1
> Constraints            4    4       6602       6.926        409.290  14.7
> Comm. energies         4    4         34       0.009          0.524   0.0
> Rest                                           8.548        505.108  18.1
> -----------------------------------------------------------------------------
> Total                                         47.127       2784.772 100.0
> -----------------------------------------------------------------------------
> Breakdown of PME mesh computation
> -----------------------------------------------------------------------------
> PME redist. X/F        4    4       6602       2.538        149.998   5.4
> PME spread             4    4       3301       6.055        357.770  12.8
> PME gather             4    4       3301       3.432        202.814   7.3
> PME 3D-FFT             4    4       6602      10.559        623.925  22.4
> PME 3D-FFT Comm.       4    4       6602       2.691        158.993   5.7
> PME solve Elec         4    4       3301       0.521         30.805   1.1
> -----------------------------------------------------------------------------
> 
>               Core t (s)   Wall t (s)        (%)
>       Time:      754.033       47.127     1600.0
>                 (ns/day)    (hour/ns)
> Performance:        6.052        3.966
> Finished mdrun on rank 0 Mon Dec 10 17:10:34 2018
> 
> 
> =========================================== ntmpi  2: ntomp 8 
> ==============================================
> 
>       <======  ###############  ==>
>       <====  A V E R A G E S  ====>
>       <==  ###############  ======>
> 
>       Statistics over 11201 steps using 113 frames
> 
>   Energies (kJ/mol)
>          Angle       G96Angle    Proper Dih.  Improper Dih.          LJ-14
>    9.16403e+05    2.12953e+04    6.61725e+04    2.26296e+02    8.35215e+04
>     Coulomb-14        LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.
>   -2.84508e+07   -1.43740e+05   -2.04658e+03    1.34647e+07    2.76232e+04
> Position Rest.      Potential    Kinetic En.   Total Energy    Temperature
>    5.93627e+01   -1.40166e+07    1.88847e+05   -1.38277e+07    3.00981e+02
> Pres. DC (bar) Pressure (bar)   Constr. rmsd
>   -2.88685e+00    8.53077e+01    0.00000e+00
> 
>   Total Virial (kJ/mol)
>    3.15233e+04   -6.80636e+02    9.80007e+01
>   -6.81075e+02    2.45640e+04   -1.40642e+03
>    9.81033e+01   -1.40643e+03    4.02877e+04
> 
>   Pressure (bar)
>    8.11163e+01    1.87348e+00   -2.03329e-01
>    1.87469e+00    1.09211e+02    3.83468e+00
>   -2.03613e-01    3.83470e+00    6.55961e+01
> 
>         T-PDMS         T-VMOS
>    2.99551e+02    3.84895e+02
> 
> 
>       P P   -   P M E   L O A D   B A L A N C I N G
> 
> NOTE: The PP/PME load balancing was limited by the maximum allowed grid 
> scaling,
>       you might not have reached a good load balance.
> 
> PP/PME load balancing changed the cut-off and PME settings:
>           particle-particle                    PME
>            rcoulomb  rlist            grid      spacing   1/beta
>   initial  1.000 nm  1.000 nm     160 160 128   0.156 nm  0.320 nm
>   final    1.628 nm  1.628 nm      96  96  80   0.260 nm  0.521 nm
> cost-ratio           4.31             0.23
> (note that these numbers concern only part of the total PP and PME load)
> 
> 
>       M E G A - F L O P S   A C C O U N T I N G
> 
> NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
> RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
> W3=SPC/TIP3p  W4=TIP4p (single or pairs)
> V&F=Potential and force  V=Potential only  F=Force only
> 
> Computing:                               M-Number         M-Flops  % Flops
> -----------------------------------------------------------------------------
> Pair Search distance check            1057.319360        9515.874     0.0
> NxN Ewald Elec. + LJ [F]           1410325.411968    93081477.190    93.9
> NxN Ewald Elec. + LJ [V&F]           14378.367616     1538485.335     1.6
> 1,4 nonbonded interactions            2017.860150      181607.413     0.2
> Calc Weights                          2534.338260       91236.177     0.1
> Spread Q Bspline                     54065.882880      108131.766     0.1
> Gather F Bspline                     54065.882880      324395.297     0.3
> 3D-FFT                              383450.341906     3067602.735     3.1
> Solve PME                              113.199616        7244.775     0.0
> Reset In Box                             8.522460          25.567     0.0
> CG-CoM                                   8.597880          25.794     0.0
> Angles                                1597.486620      268377.752     0.3
> Propers                                336.366030       77027.821     0.1
> Impropers                                2.016180         419.365     0.0
> Pos. Restr.                             16.129440         806.472     0.0
> Virial                                   8.532630         153.587     0.0
> Update                                 844.779420       26188.162     0.0
> Stop-CM                                  8.597880          85.979     0.0
> Calc-Ekin                               17.044920         460.213     0.0
> Lincs                                 1753.732822      105223.969     0.1
> Lincs-Mat                            39788.083512      159152.334     0.2
> Constraint-V                          3507.309174       28058.473     0.0
> Constraint-Vir                           8.845375         212.289     0.0
> -----------------------------------------------------------------------------
> Total                                                99075914.342   100.0
> -----------------------------------------------------------------------------
> 
> 
>    D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S
> 
> av. #atoms communicated per step for force:  2 x 6810.8
> av. #atoms communicated per step for LINCS:  2 x 3029.3
> 
> 
> Dynamic load balancing report:
> DLB was off during the run due to low measured imbalance.
> Average load imbalance: 0.8%.
> The balanceable part of the MD step is 46%, load imbalance is computed from 
> this.
> Part of the total run time spent waiting due to load imbalance: 0.4%.
> 
> 
>     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
> 
> On 2 MPI ranks, each using 8 OpenMP threads
> 
> Computing:          Num   Num      Call    Wall time         Giga-Cycles
>                     Ranks Threads  Count      (s)         total sum    %
> -----------------------------------------------------------------------------
> Domain decomp.         2    8        113       1.532         90.505   1.4
> DD comm. load          2    8          4       0.000          0.027   0.0
> Neighbor search        2    8        113       0.442         26.107   0.4
> Launch GPU ops.        2    8      22402       1.230         72.668   1.1
> Comm. coord.           2    8      11088       0.894         52.844   0.8
> Force                  2    8      11201       8.166        482.534   7.5
> Wait + Comm. F         2    8      11201       0.672         39.720   0.6
> PME mesh               2    8      11201      61.637       3642.183  56.6
> Wait GPU NB nonloc.    2    8      11201       0.342         20.205   0.3
> Wait GPU NB local      2    8      11201       0.031          1.847   0.0
> NB X/F buffer ops.     2    8      44578       1.793        105.947   1.6
> Write traj.            2    8          4       0.040          2.386   0.0
> Update                 2    8      22402       4.148        245.121   3.8
> Constraints            2    8      22402      19.207       1134.940  17.6
> Comm. energies         2    8        113       0.006          0.354   0.0
> Rest                                           8.801        520.065   8.1
> -----------------------------------------------------------------------------
> Total                                        108.942       6437.452 100.0
> -----------------------------------------------------------------------------
> Breakdown of PME mesh computation
> -----------------------------------------------------------------------------
> PME redist. X/F        2    8      22402       4.992        294.991   4.6
> PME spread             2    8      11201      16.979       1003.299  15.6
> PME gather             2    8      11201      11.687        690.563  10.7
> PME 3D-FFT             2    8      22402      21.648       1279.195  19.9
> PME 3D-FFT Comm.       2    8      22402       4.985        294.567   4.6
> PME solve Elec         2    8      11201       1.241         73.332   1.1
> -----------------------------------------------------------------------------
> 
>               Core t (s)   Wall t (s)        (%)
>       Time:     1743.073      108.942     1600.0
>                 (ns/day)    (hour/ns)
> Performance:        8.883        2.702
> Finished mdrun on rank 0 Mon Dec 10 17:01:45 2018
> 
> 
> 
> -- 
> Gromacs Users mailing list
> 
> * Please search the archive at 
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
> 
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> 
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
> mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to