Hi Justin and Mark, Thanks once again for getting back.
I've found the problem - it's actually a known bug already: http://redmine.gromacs.org/issues/901 The dispersion correction is multiplied my the number of processes (I found this after taking a closer look at my md.log files to see where the energy was changing)! I guess this means I should use the serial version for meaningful binding energies. It also looks like it will be fixed for version 4.5.6 Thank you again, I really appreciate your help. Steve > On 30/05/2012 9:42 PM, Stephen Cox wrote: > > Hi Justin, > > > > Thanks for getting back and posting the links. > > > > > > On 5/29/12 6:22 AM, Stephen Cox wrote: > > > Hi, > > > > > > I'm running a number of energy minimizations on a clathrate > > supercell and I get > > > quite significantly different values for the total energy > > depending on the > > > number of mpi processes / number of threads I use. More > > specifically, some > > > numbers I get are: > > > > > > #cores energy > > > 1 -2.41936409202696e+04 > > > 2 -2.43726425776809e+04 > > > 3 -2.45516442350804e+04 > > > 4 -2.47003944216983e+04 > > > > > > #threads energy > > > 1 -2.41936409202696e+04 > > > 2 -2.43726425776792e+04 > > > 3 -2.45516442350804e+04 > > > 4 -2.47306458924815e+04 > > > > > > > > > I'd expect some numerical noise, but these differences seem to0 > > large for that. > > > > The difference is only 2%, which by MD standards, is quite good, > > I'd say ;) > > Consider the discussion here: > > > > > > I agree for MD this wouldn't be too bad, but I'd expect energy > > minimization to get very close to the same local minimum from a given > > starting configuration. The thing is I want to compute a binding curve > > for my clathrate and compare to DFT values for the binding energy > > (amongst other things), and the difference in energy between different > > number of cores is rather significant for this purpose. > > Given the usual roughness of the PE surface to which you are minimizing, > some variation in end point is expected. > > > > > Furthermore, if I only calculate the energy for nsteps = 0 (i.e. a > > single point energy for identical structures) I get the same trend as > > above (both mpi/openmp with domain/particle decomposition). Surely > > there shouldn't be such a large difference in energy for a single > > point calculation? > > nsteps = 0 is not strictly a single-point energy, since the constraints > act before EM step 0. mdrun -s -rerun will give a single point. This > probably won't change your observations. You should also be sure you're > making observations with the latest release (4.5.5). > > If you can continue to observe this trend for more processors > (overallocating?), then you may have evidence of a problem - but a full > system description and an .mdp file will be in order also. > > Mark > > > > > http://www.gromacs.org/Documentation/Terminology/Reproducibility > > > > To an extent, the information here may also be relevant: > > > > > http://www.gromacs.org/Documentation/How-tos/Extending_Simulations#Exact_vs_binary_identical_continuation > > > > > Before submitting a bug report, I'd like to check: > > > a) if someone has seen something similar; > > > > Sure. Energies can be different due to a whole host of factors > > (discussed > > above), and MPI only complicates matters. > > > > > b) should I just trust the serial version? > > > > Maybe, but I don't know that there's evidence to say that any of > > the above tests > > are more or less accurate than the others. What happens if you > > run with mdrun > > -reprod on all your tests? > > > > > > Running with -reprod produces the same trend as above. If it was > > numerical noise, I would have thought that the numbers would fluctuate > > around some average value, not follow a definite trend where the > > energy decreases with the number of cores/threads... > > > > > > > c) have I simply done something stupid (grompp.mdp appended below); > > > > > > > Nope, looks fine. > > > > -Justin > > > > Thanks again for getting back to me. > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://lists.gromacs.org/pipermail/gmx-users/attachments/20120530/a4ed4a18/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Wed, 30 May 2012 07:51:02 -0400 > From: "Justin A. Lemkul" <[email protected]> > Subject: Re: [gmx-users] Re: Possible bug: energy changes with the > number of nodes for energy minimization > To: Discussion list for GROMACS users <[email protected]> > Message-ID: <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > > > On 5/30/12 7:42 AM, Stephen Cox wrote: > > Hi Justin, > > > > Thanks for getting back and posting the links. > > > > > > On 5/29/12 6:22 AM, Stephen Cox wrote: > > > Hi, > > > > > > I'm running a number of energy minimizations on a clathrate > supercell and > > I get > > > quite significantly different values for the total energy > depending on the > > > number of mpi processes / number of threads I use. More > specifically, some > > > numbers I get are: > > > > > > #cores energy > > > 1 -2.41936409202696e+04 > > > 2 -2.43726425776809e+04 > > > 3 -2.45516442350804e+04 > > > 4 -2.47003944216983e+04 > > > > > > #threads energy > > > 1 -2.41936409202696e+04 > > > 2 -2.43726425776792e+04 > > > 3 -2.45516442350804e+04 > > > 4 -2.47306458924815e+04 > > > > > > > > > I'd expect some numerical noise, but these differences seem to0 > large for > > that. > > > > The difference is only 2%, which by MD standards, is quite good, I'd > say ;) > > Consider the discussion here: > > > > > > I agree for MD this wouldn't be too bad, but I'd expect energy > minimization to > > get very close to the same local minimum from a given starting > configuration. > > The thing is I want to compute a binding curve for my clathrate and > compare to > > DFT values for the binding energy (amongst other things), and the > difference in > > energy between different number of cores is rather significant for this > purpose. > > > > I think the real issue comes down to how you're going to calculate binding > energy. I would still expect that with sufficient MD sampling, the > differences > should be small or statistically insignificant given the nature of MD > calculations. EM will likely be very sensitive to the nature of how it is > run > (MPI vs. serial, etc) since even the tiny rounding errors and other factors > described below will cause changes in how the EM algorithm proceeds. For > most > purposes, such differences are irrelevant as EM is only a preparatory step > for > more intense calculations. > > > Furthermore, if I only calculate the energy for nsteps = 0 (i.e. a > single point > > energy for identical structures) I get the same trend as above (both > mpi/openmp > > with domain/particle decomposition). Surely there shouldn't be such a > large > > difference in energy for a single point calculation? > > > > That depends. Are you using the same .mdp file, just setting "nsteps = > 0"? If > so, that's not a good test. EM algorithms will make a change at step 0, > the > magnitude of which will again reflect the differences you're seeing. If > you use > the md integrator with a zero-step evaluation, that's a better test. > > -Justin > > >
-- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

