I got up to 25-26 ns/day with my 4 replica system (same logic scaled up to 73 replicas) which I think is reasonable. Could I do better?
mpirun -np 48 gmx_mpi mdrun -ntomp 1 -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 I have tried following the manual but I don't think i'm going it right I keep getting errors. If you have a minute to suggest how I could do this I would appreciate that. log file accounting: R E A L C Y C L E A N D T I M E A C C O U N T I N G On 12 MPI ranks Computing: Num Num Call Wall time Giga-Cycles Ranks Threads Count (s) total sum % ----------------------------------------------------------------------------- Domain decomp. 12 1 26702 251.490 8731.137 1.5 DD comm. load 12 1 25740 1.210 42.003 0.0 DD comm. bounds 12 1 26396 9.627 334.238 0.1 Neighbor search 12 1 25862 283.564 9844.652 1.7 Launch GPU ops. 12 1 5004002 343.309 11918.867 2.0 Comm. coord. 12 1 2476139 508.526 17654.811 3.0 Force 12 1 2502001 419.341 14558.495 2.5 Wait + Comm. F 12 1 2502001 347.752 12073.100 2.1 PME mesh 12 1 2502001 11721.893 406955.915 69.2 Wait Bonded GPU 12 1 2503 0.008 0.285 0.0 Wait GPU NB nonloc. 12 1 2502001 48.918 1698.317 0.3 Wait GPU NB local 12 1 2502001 19.475 676.141 0.1 NB X/F buffer ops. 12 1 9956280 753.489 26159.337 4.5 Write traj. 12 1 519 1.078 37.427 0.0 Update 12 1 2502001 434.272 15076.886 2.6 Constraints 12 1 2502001 701.800 24364.800 4.1 Comm. energies 12 1 125942 36.574 1269.776 0.2 Rest 1047.855 36378.988 6.2 ----------------------------------------------------------------------------- Total 16930.182 587775.176 100.0 ----------------------------------------------------------------------------- Breakdown of PME mesh computation ----------------------------------------------------------------------------- PME redist. X/F 12 1 5004002 1650.247 57292.604 9.7 PME spread 12 1 2502001 4133.126 143492.183 24.4 PME gather 12 1 2502001 2303.327 79965.968 13.6 PME 3D-FFT 12 1 5004002 2119.410 73580.828 12.5 PME 3D-FFT Comm. 12 1 5004002 918.318 31881.804 5.4 PME solve Elec 12 1 2502001 584.446 20290.548 3.5 ----------------------------------------------------------------------------- Best, Miro On Tue, Mar 31, 2020 at 9:58 AM Szilárd Páll <pall.szil...@gmail.com> wrote: > > On Sun, Mar 29, 2020 at 3:56 AM Miro Astore <miro.ast...@gmail.com> wrote: > > > Hi everybody. I've been experimenting with REMD for my system running > > on 48 cores with 4 gpus (I will need to scale up to 73 replicas > > because this is a complicated system with many DOF I'm open to being > > told this is all a silly idea). > > > > It is a bad idea, you should have at least 1 physical core per replica and > with a large system ideally more. > However, if you are going for high efficiency (aggregate ns/day per phyical > node), always put at least 2 replicas per GPU. > > > > > > My run configuration is > > mpirun -np 4 --map-by numa gmx_mpi mdrun -cpi memb_prod1.cpt -ntomp 11 > > -v -deffnm memb_prod1 -multidir 1 2 3 4 -replex 1000 > > > > the best I can squeeze out of this is 9ns/day. In a non-replica > > simulation I can hit 50ns/day with a single GPU and 12 cores. > > > > That is abnormal and indicates that: > - either something is wrong with the hardware mapping / assignment in your > run or; do use simply "-pin on" and let mdrun manage threads pinning (that > map-by-numa is certainly not optimal); also I advise against tweaking the > thread count and using weird numbers like 11 (just use quarter); > - your exchange overhead is very high (check the communication cost in the > log) > > If you share some log files of a standalone and a replex run, we can advise > where the performance loss comes from. > > Cheers, > -- > Szilárd > > Looking at my accounting, for a single replica 52% of time is being > > spent on the "Force" category with 92% of my Mflops going into NxN > > Ewald Elec. + LJ [F] > > > > > I'm wondering what I could do to reduce this bottle neck if anything. > > > > Thank you. > > -- > > Miro A. Astore (he/him) > > PhD Candidate | Computational Biophysics > > Office 434 A28 School of Physics > > University of Sydney > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Miro A. Astore (he/him) PhD Candidate | Computational Biophysics Office 434 A28 School of Physics University of Sydney -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.