Carsten,THanks for the response. my mistake - it was the GTX 980 from fig 3. … I was recalling from memory….. I assume that similar results would be achieved with the 1060’s
No I did not reset , my results were a compilation of 4-5 runs each under slightly different conditions on two computers. All with the same outcome - that is ugh!. Mark had asked for the log outputs indicating some useful conclusions could be drawn from them. Paul > On Dec 12, 2018, at 9:02 AM, Kutzner, Carsten <ckut...@gwdg.de> wrote: > > Hi Paul, > >> On 12. Dec 2018, at 15:36, pbusc...@q.com wrote: >> >> Dear users ( one more try ) >> >> I am trying to use 2 GPU cards to improve modeling speed. The computer >> described in the log files is used to iron out models and am using to learn >> how to use two GPU cards before purchasing two new RTX 2080 ti's. The CPU >> is a 8 core 16 thread AMD and the GPU's are two GTX 1060; there are 50000 >> atoms in the model >> >> Using ntpmi and ntomp settings of 1: 16, auto ( 4:4) and 2: 8 ( and any >> other combination factoring to 16) the rating for ns/day are approx. >> 12-16 and for any other setting ~6-8 i.e adding a card cuts efficiency by >> half. The average load imbalance is less than 3.4% for the multicard setup . >> >> I am not at this point trying to maximize efficiency, but only to show some >> improvement going from one to two cards. According to a 2015 paper form >> the Gromacs group “ Best bang for your buck: GPU nodes for GROMACS >> biomolecular simulations “ I should expect maybe (at best ) 50% >> improvement for 90k atoms ( with 2x GTX 970 ) > We did not benchmark GTX 970 in that publication. > > But from Table 6 you can see that we also had quite a few cases with out 80k > benchmark > where going from 1 to 2 GPUs, simulation speed did not increase much: E.g. > for the > E5-2670v2 going from one to 2 GTX 980 GPUs led to an increase of 10 percent. > > Did you use counter resetting for the benchnarks? > > Carsten > > >> What bothers me in my initial attempts is that my simulations became slower >> by adding the second GPU - it is frustrating to say the least. It's like >> swimming backwards. >> >> I know am missing - as a minimum - the correct setup for mdrun and >> suggestions would be welcome >> >> The output from the last section of the log files is included below. >> >> =========================== ntpmi 1 ntomp:16 ============================== >> >> <====== ############### ==> >> <==== A V E R A G E S ====> >> <== ############### ======> >> >> Statistics over 29301 steps using 294 frames >> >> Energies (kJ/mol) >> Angle G96Angle Proper Dih. Improper Dih. LJ-14 >> 9.17533e+05 2.27874e+04 6.64128e+04 2.31214e+02 8.34971e+04 >> Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. >> -2.84567e+07 -1.43385e+05 -2.04658e+03 1.33320e+07 1.59914e+05 >> Position Rest. Potential Kinetic En. Total Energy Temperature >> 7.79893e+01 -1.40196e+07 1.88467e+05 -1.38312e+07 3.00376e+02 >> Pres. DC (bar) Pressure (bar) Constr. rmsd >> -2.88685e+00 3.75436e+01 0.00000e+00 >> >> Total Virial (kJ/mol) >> 5.27555e+04 -4.87626e+02 1.86144e+02 >> -4.87648e+02 4.04479e+04 -1.91959e+02 >> 1.86177e+02 -1.91957e+02 5.45671e+04 >> >> Pressure (bar) >> 2.22202e+01 1.27887e+00 -4.71738e-01 >> 1.27893e+00 6.48135e+01 5.12638e-01 >> -4.71830e-01 5.12632e-01 2.55971e+01 >> >> T-PDMS T-VMOS >> 2.99822e+02 3.32834e+02 >> >> >> M E G A - F L O P S A C C O U N T I N G >> >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table >> W3=SPC/TIP3p W4=TIP4p (single or pairs) >> V&F=Potential and force V=Potential only F=Force only >> >> Computing: M-Number M-Flops % Flops >> ----------------------------------------------------------------------------- >> Pair Search distance check 2349.753264 21147.779 0.0 >> NxN Ewald Elec. + LJ [F] 1771584.591744 116924583.055 96.6 >> NxN Ewald Elec. + LJ [V&F] 17953.091840 1920980.827 1.6 >> 1,4 nonbonded interactions 5278.575150 475071.763 0.4 >> Shift-X 22.173480 133.041 0.0 >> Angles 4178.908620 702056.648 0.6 >> Propers 879.909030 201499.168 0.2 >> Impropers 5.274180 1097.029 0.0 >> Pos. Restr. 42.193440 2109.672 0.0 >> Virial 22.186710 399.361 0.0 >> Update 2209.881420 68506.324 0.1 >> Stop-CM 22.248900 222.489 0.0 >> Calc-Ekin 44.346960 1197.368 0.0 >> Lincs 4414.639320 264878.359 0.2 >> Lincs-Mat 100297.229760 401188.919 0.3 >> Constraint-V 8829.127980 70633.024 0.1 >> Constraint-Vir 22.147020 531.528 0.0 >> ----------------------------------------------------------------------------- >> Total 121056236.355 100.0 >> ----------------------------------------------------------------------------- >> R E A L C Y C L E A N D T I M E A C C O U N T I N G >> On 1 MPI rank, each using 16 OpenMP threads >> >> Computing: Num Num Call Wall time Giga-Cycles >> Ranks Threads Count (s) total sum % >> ----------------------------------------------------------------------------- >> Neighbor search 1 16 294 2.191 129.485 1.0 >> Launch GPU ops. 1 16 58602 4.257 251.544 2.0 >> Force 1 16 29301 23.769 1404.510 11.3 >> Wait PME GPU gather 1 16 29301 33.740 1993.695 16.0 >> Reduce GPU PME F 1 16 29301 7.244 428.079 3.4 >> Wait GPU NB local 1 16 29301 60.054 3548.612 28.5 >> NB X/F buffer ops. 1 16 58308 9.823 580.459 4.7 >> Write traj. 1 16 7 0.119 7.048 0.1 >> Update 1 16 58602 11.089 655.275 5.3 >> Constraints 1 16 58602 40.378 2385.992 19.2 >> Rest 17.743 1048.462 8.4 >> ----------------------------------------------------------------------------- >> Total 210.408 12433.160 100.0 >> ----------------------------------------------------------------------------- >> >> Core t (s) Wall t (s) (%) >> Time: 3366.529 210.408 1600.0 >> (ns/day) (hour/ns) >> Performance: 12.032 1.995 >> Finished mdrun on rank 0 Mon Dec 10 17:17:04 2018 >> >> >> =========================== ntpmi and ntomp auto ( 4:4 ) >> ======================================= >> >> >> <====== ############### ==> >> <==== A V E R A G E S ====> >> <== ############### ======> >> >> Statistics over 3301 steps using 34 frames >> >> Energies (kJ/mol) >> Angle G96Angle Proper Dih. Improper Dih. LJ-14 >> 9.20586e+05 1.95534e+04 6.56058e+04 2.21093e+02 8.56673e+04 >> Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. >> -2.84553e+07 -1.44595e+05 -2.04658e+03 1.34518e+07 4.26167e+04 >> Position Rest. Potential Kinetic En. Total Energy Temperature >> 3.83653e+01 -1.40159e+07 1.90353e+05 -1.38255e+07 3.03381e+02 >> Pres. DC (bar) Pressure (bar) Constr. rmsd >> -2.88685e+00 2.72913e+02 0.00000e+00 >> >> Total Virial (kJ/mol) >> -5.05948e+04 -3.29107e+03 4.84786e+02 >> -3.29135e+03 -3.42006e+04 -3.32392e+03 >> 4.84606e+02 -3.32403e+03 -2.06849e+04 >> >> Pressure (bar) >> 3.09713e+02 8.98192e+00 -1.19828e+00 >> 8.98270e+00 2.73248e+02 8.99543e+00 >> -1.19778e+00 8.99573e+00 2.35776e+02 >> >> T-PDMS T-VMOS >> 2.98623e+02 5.82467e+02 >> >> >> P P - P M E L O A D B A L A N C I N G >> >> NOTE: The PP/PME load balancing was limited by the maximum allowed grid >> scaling, >> you might not have reached a good load balance. >> >> PP/PME load balancing changed the cut-off and PME settings: >> particle-particle PME >> rcoulomb rlist grid spacing 1/beta >> initial 1.000 nm 1.000 nm 160 160 128 0.156 nm 0.320 nm >> final 1.628 nm 1.628 nm 96 96 80 0.260 nm 0.521 nm >> cost-ratio 4.31 0.23 >> (note that these numbers concern only part of the total PP and PME load) >> >> >> M E G A - F L O P S A C C O U N T I N G >> >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table >> W3=SPC/TIP3p W4=TIP4p (single or pairs) >> V&F=Potential and force V=Potential only F=Force only >> >> Computing: M-Number M-Flops % Flops >> ----------------------------------------------------------------------------- >> Pair Search distance check 285.793872 2572.145 0.0 >> NxN Ewald Elec. + LJ [F] 367351.034688 24245168.289 92.1 >> NxN Ewald Elec. + LJ [V&F] 3841.181056 411006.373 1.6 >> 1,4 nonbonded interactions 594.675150 53520.763 0.2 >> Calc Weights 746.884260 26887.833 0.1 >> Spread Q Bspline 15933.530880 31867.062 0.1 >> Gather F Bspline 15933.530880 95601.185 0.4 >> 3D-FFT 154983.295306 1239866.362 4.7 >> Solve PME 40.079616 2565.095 0.0 >> Reset In Box 2.564280 7.693 0.0 >> CG-CoM 2.639700 7.919 0.0 >> Angles 470.788620 79092.488 0.3 >> Propers 99.129030 22700.548 0.1 >> Impropers 0.594180 123.589 0.0 >> Pos. Restr. 4.753440 237.672 0.0 >> Virial 2.570400 46.267 0.0 >> Update 248.961420 7717.804 0.0 >> Stop-CM 2.639700 26.397 0.0 >> Calc-Ekin 5.128560 138.471 0.0 >> Lincs 557.713246 33462.795 0.1 >> Lincs-Mat 12624.363456 50497.454 0.2 >> Constraint-V 1115.257670 8922.061 0.0 >> Constraint-Vir 2.871389 68.913 0.0 >> ----------------------------------------------------------------------------- >> Total 26312105.181 100.0 >> ----------------------------------------------------------------------------- >> >> >> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S >> >> av. #atoms communicated per step for force: 2 x 16748.9 >> av. #atoms communicated per step for LINCS: 2 x 9361.6 >> >> >> Dynamic load balancing report: >> DLB was off during the run due to low measured imbalance. >> Average load imbalance: 3.4%. >> The balanceable part of the MD step is 46%, load imbalance is computed from >> this. >> Part of the total run time spent waiting due to load imbalance: 1.6%. >> >> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G >> >> On 4 MPI ranks, each using 4 OpenMP threads >> >> Computing: Num Num Call Wall time Giga-Cycles >> Ranks Threads Count (s) total sum % >> ----------------------------------------------------------------------------- >> Domain decomp. 4 4 34 0.457 26.976 1.0 >> DD comm. load 4 4 2 0.000 0.008 0.0 >> Neighbor search 4 4 34 0.138 8.160 0.3 >> Launch GPU ops. 4 4 6602 0.441 26.070 0.9 >> Comm. coord. 4 4 3267 0.577 34.081 1.2 >> Force 4 4 3301 2.298 135.761 4.9 >> Wait + Comm. F 4 4 3301 0.276 16.330 0.6 >> PME mesh 4 4 3301 25.822 1525.817 54.8 >> Wait GPU NB nonloc. 4 4 3301 0.132 7.819 0.3 >> Wait GPU NB local 4 4 3301 0.012 0.724 0.0 >> NB X/F buffer ops. 4 4 13136 0.471 27.822 1.0 >> Write traj. 4 4 2 0.014 0.839 0.0 >> Update 4 4 6602 1.006 59.442 2.1 >> Constraints 4 4 6602 6.926 409.290 14.7 >> Comm. energies 4 4 34 0.009 0.524 0.0 >> Rest 8.548 505.108 18.1 >> ----------------------------------------------------------------------------- >> Total 47.127 2784.772 100.0 >> ----------------------------------------------------------------------------- >> Breakdown of PME mesh computation >> ----------------------------------------------------------------------------- >> PME redist. X/F 4 4 6602 2.538 149.998 5.4 >> PME spread 4 4 3301 6.055 357.770 12.8 >> PME gather 4 4 3301 3.432 202.814 7.3 >> PME 3D-FFT 4 4 6602 10.559 623.925 22.4 >> PME 3D-FFT Comm. 4 4 6602 2.691 158.993 5.7 >> PME solve Elec 4 4 3301 0.521 30.805 1.1 >> ----------------------------------------------------------------------------- >> >> Core t (s) Wall t (s) (%) >> Time: 754.033 47.127 1600.0 >> (ns/day) (hour/ns) >> Performance: 6.052 3.966 >> Finished mdrun on rank 0 Mon Dec 10 17:10:34 2018 >> >> >> =========================================== ntmpi 2: ntomp 8 >> ============================================== >> >> <====== ############### ==> >> <==== A V E R A G E S ====> >> <== ############### ======> >> >> Statistics over 11201 steps using 113 frames >> >> Energies (kJ/mol) >> Angle G96Angle Proper Dih. Improper Dih. LJ-14 >> 9.16403e+05 2.12953e+04 6.61725e+04 2.26296e+02 8.35215e+04 >> Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. >> -2.84508e+07 -1.43740e+05 -2.04658e+03 1.34647e+07 2.76232e+04 >> Position Rest. Potential Kinetic En. Total Energy Temperature >> 5.93627e+01 -1.40166e+07 1.88847e+05 -1.38277e+07 3.00981e+02 >> Pres. DC (bar) Pressure (bar) Constr. rmsd >> -2.88685e+00 8.53077e+01 0.00000e+00 >> >> Total Virial (kJ/mol) >> 3.15233e+04 -6.80636e+02 9.80007e+01 >> -6.81075e+02 2.45640e+04 -1.40642e+03 >> 9.81033e+01 -1.40643e+03 4.02877e+04 >> >> Pressure (bar) >> 8.11163e+01 1.87348e+00 -2.03329e-01 >> 1.87469e+00 1.09211e+02 3.83468e+00 >> -2.03613e-01 3.83470e+00 6.55961e+01 >> >> T-PDMS T-VMOS >> 2.99551e+02 3.84895e+02 >> >> >> P P - P M E L O A D B A L A N C I N G >> >> NOTE: The PP/PME load balancing was limited by the maximum allowed grid >> scaling, >> you might not have reached a good load balance. >> >> PP/PME load balancing changed the cut-off and PME settings: >> particle-particle PME >> rcoulomb rlist grid spacing 1/beta >> initial 1.000 nm 1.000 nm 160 160 128 0.156 nm 0.320 nm >> final 1.628 nm 1.628 nm 96 96 80 0.260 nm 0.521 nm >> cost-ratio 4.31 0.23 >> (note that these numbers concern only part of the total PP and PME load) >> >> >> M E G A - F L O P S A C C O U N T I N G >> >> NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels >> RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table >> W3=SPC/TIP3p W4=TIP4p (single or pairs) >> V&F=Potential and force V=Potential only F=Force only >> >> Computing: M-Number M-Flops % Flops >> ----------------------------------------------------------------------------- >> Pair Search distance check 1057.319360 9515.874 0.0 >> NxN Ewald Elec. + LJ [F] 1410325.411968 93081477.190 93.9 >> NxN Ewald Elec. + LJ [V&F] 14378.367616 1538485.335 1.6 >> 1,4 nonbonded interactions 2017.860150 181607.413 0.2 >> Calc Weights 2534.338260 91236.177 0.1 >> Spread Q Bspline 54065.882880 108131.766 0.1 >> Gather F Bspline 54065.882880 324395.297 0.3 >> 3D-FFT 383450.341906 3067602.735 3.1 >> Solve PME 113.199616 7244.775 0.0 >> Reset In Box 8.522460 25.567 0.0 >> CG-CoM 8.597880 25.794 0.0 >> Angles 1597.486620 268377.752 0.3 >> Propers 336.366030 77027.821 0.1 >> Impropers 2.016180 419.365 0.0 >> Pos. Restr. 16.129440 806.472 0.0 >> Virial 8.532630 153.587 0.0 >> Update 844.779420 26188.162 0.0 >> Stop-CM 8.597880 85.979 0.0 >> Calc-Ekin 17.044920 460.213 0.0 >> Lincs 1753.732822 105223.969 0.1 >> Lincs-Mat 39788.083512 159152.334 0.2 >> Constraint-V 3507.309174 28058.473 0.0 >> Constraint-Vir 8.845375 212.289 0.0 >> ----------------------------------------------------------------------------- >> Total 99075914.342 100.0 >> ----------------------------------------------------------------------------- >> >> >> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S >> >> av. #atoms communicated per step for force: 2 x 6810.8 >> av. #atoms communicated per step for LINCS: 2 x 3029.3 >> >> >> Dynamic load balancing report: >> DLB was off during the run due to low measured imbalance. >> Average load imbalance: 0.8%. >> The balanceable part of the MD step is 46%, load imbalance is computed from >> this. >> Part of the total run time spent waiting due to load imbalance: 0.4%. >> >> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G >> >> On 2 MPI ranks, each using 8 OpenMP threads >> >> Computing: Num Num Call Wall time Giga-Cycles >> Ranks Threads Count (s) total sum % >> ----------------------------------------------------------------------------- >> Domain decomp. 2 8 113 1.532 90.505 1.4 >> DD comm. load 2 8 4 0.000 0.027 0.0 >> Neighbor search 2 8 113 0.442 26.107 0.4 >> Launch GPU ops. 2 8 22402 1.230 72.668 1.1 >> Comm. coord. 2 8 11088 0.894 52.844 0.8 >> Force 2 8 11201 8.166 482.534 7.5 >> Wait + Comm. F 2 8 11201 0.672 39.720 0.6 >> PME mesh 2 8 11201 61.637 3642.183 56.6 >> Wait GPU NB nonloc. 2 8 11201 0.342 20.205 0.3 >> Wait GPU NB local 2 8 11201 0.031 1.847 0.0 >> NB X/F buffer ops. 2 8 44578 1.793 105.947 1.6 >> Write traj. 2 8 4 0.040 2.386 0.0 >> Update 2 8 22402 4.148 245.121 3.8 >> Constraints 2 8 22402 19.207 1134.940 17.6 >> Comm. energies 2 8 113 0.006 0.354 0.0 >> Rest 8.801 520.065 8.1 >> ----------------------------------------------------------------------------- >> Total 108.942 6437.452 100.0 >> ----------------------------------------------------------------------------- >> Breakdown of PME mesh computation >> ----------------------------------------------------------------------------- >> PME redist. X/F 2 8 22402 4.992 294.991 4.6 >> PME spread 2 8 11201 16.979 1003.299 15.6 >> PME gather 2 8 11201 11.687 690.563 10.7 >> PME 3D-FFT 2 8 22402 21.648 1279.195 19.9 >> PME 3D-FFT Comm. 2 8 22402 4.985 294.567 4.6 >> PME solve Elec 2 8 11201 1.241 73.332 1.1 >> ----------------------------------------------------------------------------- >> >> Core t (s) Wall t (s) (%) >> Time: 1743.073 108.942 1600.0 >> (ns/day) (hour/ns) >> Performance: 8.883 2.702 >> Finished mdrun on rank 0 Mon Dec 10 17:01:45 2018 >> >> >> >> -- >> Gromacs Users mailing list >> >> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> * For (un)subscribe requests visit >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a >> mail to gmx-users-requ...@gromacs.org. > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.