On Thu, Aug 20, 2015 at 5:52 PM, Szilárd Páll <pall.szil...@gmail.com> wrote:
> Hi, > > You're not pinning threads and it seems that you're running on a large SMP > machine! Assuming that the 512 threads reported (line 91) is correct that's > a 32 socket SMP machine, perhaps an SGI UV? In any case Xeon E5-4xxx is > typically deployed in 4-8 socket installations, > Correction: I confused the E5-46xx with the E7 series. These are 2-4 socket, it seems. In any case, the 512 threads reported still suggests a large SMP machine. > so your 8 threads will be floating around on a number of CPUs which ruins > your performance - and likely contributes to the varying and large load > imbalance. > > My advice: > - don't ignore notes/warnings issued by mdrun (line 366, should be on the > standard out too), we put quite some though into spamming users only when > relevant :) > - pin mdrun and/or its threads either with "-pin on" (and -pinoffset if > needed) or with whatever tools your admins provide/recommend > > [Extras: consider using FFTW even with the Intel compilers it's often > faster for our small FFTs than MKL; and GNU iso Intel compiler is often > faster too.] > > Fixing the above issues should not only reduce imbalance but most likely > also allow you to gain quite some simulation performance! Let us know if it > worked. > > Cheers, > > -- > Szilárd > > On Thu, Aug 20, 2015 at 5:08 PM, Nash, Anthony <a.n...@ucl.ac.uk> wrote: > >> Hi Mark, >> >> Many thanks for looking into this. >> >> One of the log files (the job hasn’t finished running) is here: >> https://www.dropbox.com/s/zwrro54yni2uxtn/umb_3_umb.log?dl=0 >> >> The system is a soluble collagenase in water with a collagen substrate and >> two zinc co-factors. There are 287562 atoms in the system. >> >> Please let me know if you need to know anything else. Thanks! >> >> Anthony >> >> >> >> >> >> On 20/08/2015 11:39, "Mark Abraham" <mark.j.abra...@gmail.com> wrote: >> >> >Hi, >> > >> >In cases like this, it's good to describe what's in your simulation, and >> >share the full .log file on a file-sharing service, so we can see both >> the >> >things mdrun reports early and late. >> > >> >Mark >> > >> >On Thu, Aug 20, 2015 at 8:22 AM Nash, Anthony <a.n...@ucl.ac.uk> wrote: >> > >> >> Hi all, >> >> >> >> I appear to have a very high load imbalance on some of my runs. Values >> >> starting from approx. 7% up to 31.8% with reported vol min/aver of >> >>around >> >> 0.6 (I haven¹t found one under half yet). >> >> >> >> When I look through the .log file at the start of the run I see: >> >> >> >> Initializing Domain Decomposition on 8 ranks >> >> Dynamic load balancing: auto >> >> Will sort the charge groups at every domain (re)decomposition >> >> Initial maximum inter charge-group distances: >> >> two-body bonded interactions: 0.514 nm, LJ-14, atoms 3116 3123 >> >> multi-body bonded interactions: 0.429 nm, Proper Dih., atoms 3116 >> 3123 >> >> Minimum cell size due to bonded interactions: 0.472 nm >> >> Maximum distance for 5 constraints, at 120 deg. angles, all-trans: >> >>0.862 nm >> >> Estimated maximum distance required for P-LINCS: 0.862 nm >> >> This distance will limit the DD cell size, you can override this with >> >>-rcon >> >> Using 0 separate PME ranks, as there are too few total >> >> ranks for efficient splitting >> >> Scaling the initial minimum size with 1/0.8 (option -dds) = 1.25 >> >> Optimizing the DD grid for 8 cells with a minimum initial size of 1.077 >> >>nm >> >> The maximum allowed number of cells is: X 12 Y 12 Z 12 >> >> Domain decomposition grid 4 x 2 x 1, separate PME ranks 0 >> >> PME domain decomposition: 4 x 2 x 1 >> >> Domain decomposition rank 0, coordinates 0 0 0 >> >> Using 8 MPI processes >> >> Using 1 OpenMP thread per MPI process >> >> >> >> >> >> >> >> >> >> Having a quick look through the documentation and I see that I should >> >> consider implementing the verlet cut-off (which I am) and adjust the >> >> number of PME nodes/cut-off and PME grid spacing. Would this simply be >> a >> >> case of throwing more cores at the simulation or must I play around >> with >> >> P-LINCS parameters? >> >> >> >> Thanks >> >> Anthony >> >> >> >> -- >> >> Gromacs Users mailing list >> >> >> >> * Please search the archive at >> >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >> >> posting! >> >> >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> >> >> * For (un)subscribe requests visit >> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >> >> send a mail to gmx-users-requ...@gromacs.org. >> >> >> >-- >> >Gromacs Users mailing list >> > >> >* Please search the archive at >> >http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >> >posting! >> > >> >* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> > >> >* For (un)subscribe requests visit >> >https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >> >send a mail to gmx-users-requ...@gromacs.org. >> >> -- >> Gromacs Users mailing list >> >> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >> posting! >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> * For (un)subscribe requests visit >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >> send a mail to gmx-users-requ...@gromacs.org. >> > > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.