jimkress_58 skrev:
If you turn off dlb this should not happen. Please try it and report if you see the same effect without.

No, I do not see the same effect if I turn off dlb.  However, I am concerned
that the magnitude of the differences between runs exceeds the expected,
normal variability (as defined by the RMS deviations of each run), so I am
exploring that.

Also, if I turn on nosum, as suggested by mdrun, the run with dlb turned on
diverges.  This is also a cause for concern.
No, that's expected. See David's reply below. Nosum is only good for reducing communication, thus increasing performance.

/Erik
Jim

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of David van der Spoel
Sent: Sunday, June 07, 2009 3:20 AM
To: Discussion list for GROMACS users
Subject: Re: [gmx-users] Nonrepeatable results for gromacs 4.0.5

Jim Kress wrote:
I've been doing multiple runs using gromacs v 4.0.5 mdrun and a constant
topol.tpr input file.  Unfortunately, the results that I get in my md.log
differ from run to run.

This is due to dynamic load balancing. Due to fluctuations in the CPU usage (e.g. due to operating system) your load will vary on each CPU and gromacs will try to balance it. Hence you get numerical differences because in a computer (a+b)+c != a+(b+c), and ultimately the trajectories will diverge.

If you turn off dlb this should not happen. Please try it and report if you see the same effect without.

For example,
Run 1

Started mdrun on node 0 Fri May 22 22:53:51 2009

           Step           Time         Lambda
              0        0.00000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    1.95406e+02    1.04746e+02    4.97704e+01    4.13260e+01
1.40158e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.60139e+03   -2.64656e+04   -2.20714e+04    4.03780e+03
-1.80336e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.03142e+02   -8.46977e+02    1.92470e-05

DD  step 9 load imb.: force 29.9%

At step 10 the performance loss due to force load imbalance is 8.6 %

NOTE: Turning on dynamic load balancing

DD  step 99  vol min/aver 0.731  load imb.: force  6.9%

           Step           Time         Lambda
            100        0.20000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.05310e+02    1.30129e+02    5.63474e+01    1.81814e+01
1.44270e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.69491e+03   -2.69624e+04   -2.24148e+04    4.19456e+03
-1.82203e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.14910e+02   -5.19031e+02    1.76248e-05

DD  load balancing is limited by minimum cell size in dimension Y
DD  step 199  vol min/aver 0.766! load imb.: force 10.7%

           Step           Time         Lambda
            200        0.40000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.20550e+02    1.09068e+02    6.93319e+01    5.32511e+01
1.43458e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.78241e+03   -2.70319e+04   -2.23627e+04    4.13455e+03
-1.82281e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.10405e+02   -5.01205e+02    1.70105e-05

DD  load balancing is limited by minimum cell size in dimension Y
DD  step 299  vol min/aver 0.750! load imb.: force  3.3%

           Step           Time         Lambda
            300        0.60000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.17474e+02    8.65489e+01    5.24995e+01    4.72592e+01
1.44419e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    3.17643e+03   -2.72841e+04   -2.22597e+04    3.95024e+03
-1.83095e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    2.96568e+02    1.40098e+03    1.55861e-05

DD  step 399  vol min/aver 0.700  load imb.: force  5.9%

           Step           Time         Lambda
            400        0.80000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.43143e+02    9.93116e+01    7.16796e+01    4.63666e+01
1.46722e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.84150e+03   -2.70065e+04   -2.22372e+04    4.05976e+03
-1.81775e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.04791e+02    2.48551e+02    1.61141e-05

DD  step 499  vol min/aver 0.678  load imb.: force  6.6%

           Step           Time         Lambda
            500        1.00000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.19638e+02    8.98359e+01    8.99946e+01    5.16612e+01
1.46338e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.80267e+03   -2.68507e+04   -2.21335e+04    4.14195e+03
-1.79916e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.10961e+02   -1.17210e+02    1.71420e-05

DD  step 599  vol min/aver 0.678  load imb.: force  6.7%

           Step           Time         Lambda
            600        1.20000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.32938e+02    1.04322e+02    7.11343e+01    2.16046e+01
1.45770e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    3.07425e+03   -2.71320e+04   -2.21700e+04    4.17285e+03
-1.79972e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.13281e+02    5.60002e+01    1.97532e-05

DD  step 699  vol min/aver 0.664  load imb.: force 13.1%


----------------------------------------------------------------------------
-------------------------------------

Run 2

Step 0 is the same, but then the results start to differ more and more:

Started mdrun on node 0 Sat Jun  6 14:38:03 2009

           Step           Time         Lambda
              0        0.00000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    1.95406e+02    1.04746e+02    4.97704e+01    4.13260e+01
1.40158e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.60139e+03   -2.64656e+04   -2.20714e+04    4.03780e+03
-1.80336e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.03142e+02   -8.46977e+02    1.92470e-05

DD  step 9 load imb.: force 32.9%

At step 10 the performance loss due to force load imbalance is 8.8 %

NOTE: Turning on dynamic load balancing

DD  load balancing is limited by minimum cell size in dimension Y
DD  step 99  vol min/aver 0.711! load imb.: force 13.3%

           Step           Time         Lambda
            100        0.20000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.05314e+02    1.30130e+02    5.63508e+01    1.81808e+01
1.44270e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.69491e+03   -2.69627e+04   -2.24151e+04    4.19468e+03
-1.82204e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.14919e+02   -5.13520e+02    1.76037e-05

DD  load balancing is limited by minimum cell size in dimension Y Z
DD  step 199  vol min/aver 0.760! load imb.: force 12.7%

           Step           Time         Lambda
            200        0.40000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.20600e+02    1.09011e+02    6.92931e+01    5.32915e+01
1.43453e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.78045e+03   -2.70297e+04   -2.23626e+04    4.13378e+03
-1.82288e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.10348e+02   -5.07193e+02    1.69736e-05

DD  load balancing is limited by minimum cell size in dimension Y
DD  step 299  vol min/aver 0.757! load imb.: force 12.1%

           Step           Time         Lambda
            300        0.60000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.18647e+02    8.76939e+01    5.26630e+01    4.67556e+01
1.44438e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    3.15118e+03   -2.72121e+04   -2.22108e+04    3.91294e+03
-1.82978e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    2.93768e+02    1.36397e+03    1.56756e-05

DD  load balancing is limited by minimum cell size in dimension Y Z
DD  step 399  vol min/aver 0.688! load imb.: force 12.6%

           Step           Time         Lambda
            400        0.80000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.37290e+02    9.91231e+01    6.10010e+01    3.87031e+01
1.46621e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.68805e+03   -2.68308e+04   -2.22404e+04    4.05083e+03
-1.81896e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.04120e+02   -2.55369e+02    1.63518e-05

DD  load balancing is limited by minimum cell size in dimension Z
DD  step 499  vol min/aver 0.677! load imb.: force 10.1%

           Step           Time         Lambda
            500        1.00000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.30361e+02    8.47035e+01    8.84842e+01    4.44614e+01
1.44045e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.91452e+03   -2.70665e+04   -2.22635e+04    4.18886e+03
-1.80746e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.14483e+02    1.47268e+02    1.75008e-05

DD  load balancing is limited by minimum cell size in dimension Z
DD  step 599  vol min/aver 0.692! load imb.: force  7.7%

           Step           Time         Lambda
            600        1.20000        0.00000

   Energies (kJ/mol)
       G96Angle    Proper Dih.  Improper Dih.          LJ-14
Coulomb-14
    2.19896e+02    9.93832e+01    6.10071e+01    2.95745e+01
1.45874e+03
        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.   Total
Energy
    2.81555e+03   -2.71300e+04   -2.24458e+04    4.17303e+03
-1.82728e+04
    Temperature Pressure (bar)  Cons. rmsd ()
    3.13294e+02   -3.05949e+02    1.64990e-05

DD  load balancing is limited by minimum cell size in dimension Z
DD  step 699  vol min/aver 0.719! load imb.: force  4.9%


----------------------------------------------------------------------------
--------------------

Any ideas why I am seeing this?

Here is the initial mdrun printed input info:


                         :-)  G  R  O  M  A  C  S  (-:

                   Groningen Machine for Chemical Simulation

                            :-)  VERSION 4.0.5  (-:


      Written by David van der Spoel, Erik Lindahl, Berk Hess, and others.
       Copyright (c) 1991-2000, University of Groningen, The Netherlands.
             Copyright (c) 2001-2008, The GROMACS development team,
            check out http://www.gromacs.org for more information.

         This program is free software; you can redistribute it and/or
          modify it under the terms of the GNU General Public License
         as published by the Free Software Foundation; either version 2
             of the License, or (at your option) any later version.

                              :-)  mdrun_mpi  (-:


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl GROMACS 4:
Algorithms for highly efficient, load-balanced, and scalable molecular
simulation J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J.
C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel GROMACS 3.0: A package for
molecular simulation and trajectory analysis J. Mol. Mod. 7 (2001) pp.
306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp.
Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

parameters of the run:
   integrator           = md
   nsteps               = 5000000
   init_step            = 0
   ns_type              = Grid
   nstlist              = 10
   ndelta               = 2
   nstcomm              = 1
   comm_mode            = Linear
   nstlog               = 100
   nstxout              = 50
   nstvout              = 0
   nstfout              = 0
   nstenergy            = 100
   nstxtcout            = 0
   init_t               = 0
   delta_t              = 0.002
   xtcprec              = 1000
   nkx                  = 0
   nky                  = 0
   nkz                  = 0
   pme_order            = 4
   ewald_rtol           = 1e-05
   ewald_geometry       = 0
   epsilon_surface      = 0
   optimize_fft         = FALSE
   ePBC                 = xyz
   bPeriodicMols        = FALSE
   bContinuation        = FALSE
   bShakeSOR            = FALSE
   etc                  = Berendsen
   epc                  = No
   epctype              = Isotropic
   tau_p                = 0.5
   ref_p (3x3):
      ref_p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref_p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref_p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   compress (3x3):
      compress[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compress[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compress[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   refcoord_scaling     = No
   posres_com (3):
      posres_com[0]= 0.00000e+00
      posres_com[1]= 0.00000e+00
      posres_com[2]= 0.00000e+00
   posres_comB (3):
      posres_comB[0]= 0.00000e+00
      posres_comB[1]= 0.00000e+00
      posres_comB[2]= 0.00000e+00
   andersen_seed        = 815131
   rlist                = 1
   rtpi                 = 0.05
   coulombtype          = Cut-off
   rcoulomb_switch      = 0
   rcoulomb             = 1
   vdwtype              = Cut-off
   rvdw_switch          = 0
   rvdw                 = 1
   epsilon_r            = 1
   epsilon_rf           = 1
   tabext               = 1
   implicit_solvent     = No
   gb_algorithm         = Still
   gb_epsilon_solvent   = 80
   nstgbradii           = 1
   rgbradii             = 2
   gb_saltconc          = 0
   gb_obc_alpha         = 1
   gb_obc_beta          = 0.8
   gb_obc_gamma         = 4.85
   sa_surface_tension   = 2.092
   DispCorr             = No
   free_energy          = no
   init_lambda          = 0
   sc_alpha             = 0
   sc_power             = 0
   sc_sigma             = 0.3
   delta_lambda         = 0
   nwall                = 0
   wall_type            = 9-3
   wall_atomtype[0]     = -1
   wall_atomtype[1]     = -1
   wall_density[0]      = 0
   wall_density[1]      = 0
   wall_ewald_zfac      = 3
   pull                 = no
   disre                = No
   disre_weighting      = Conservative
   disre_mixed          = FALSE
   dr_fc                = 1000
   dr_tau               = 0
   nstdisreout          = 100
   orires_fc            = 0
   orires_tau           = 0
   nstorireout          = 100
   dihre-fc             = 1000
   em_stepsize          = 0.01
   em_tol               = 10
   niter                = 20
   fc_stepsize          = 0
   nstcgsteep           = 1000
   nbfgscorr            = 10
   ConstAlg             = Lincs
   shake_tol            = 0.0001
   lincs_order          = 4
   lincs_warnangle      = 30
   lincs_iter           = 1
   bd_fric              = 0
   ld_seed              = 1993
   cos_accel            = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   userint1             = 0
   userint2             = 0
   userint3             = 0
   userint4             = 0
   userreal1            = 0
   userreal2            = 0
   userreal3            = 0
   userreal4            = 0
grpopts:
   nrdf:     284.733     2919.27
   ref_t:         300         300
   tau_t:         0.1         0.1
anneal:          No          No
ann_npoints:           0           0
   acc:            0           0           0
   nfreeze:           N           N           N
   energygrp_flags[  0]: 0
   efield-x:
      n = 0
   efield-xt:
      n = 0
   efield-y:
      n = 0
   efield-yt:
      n = 0
   efield-z:
      n = 0
   efield-zt:
      n = 0
   bQMMM                = FALSE
   QMconstraints        = 0
   QMMMscheme           = 0
   scalefactor          = 1
qm_opts:
   ngQM                 = 0

Initializing Domain Decomposition on 12 nodes Dynamic load balancing: auto
Will sort the charge groups at every domain (re)decomposition Initial
maximum inter charge-group distances:
    two-body bonded interactions: 0.597 nm, LJ-14, atoms 5 18
  multi-body bonded interactions: 0.597 nm, Proper Dih., atoms 5 18
Minimum
cell size due to bonded interactions: 0.657 nm Maximum distance for 5
constraints, at 120 deg. angles, all-trans: 0.820 nm Estimated maximum
distance required for P-LINCS: 0.820 nm This distance will limit the DD
cell
size, you can override this with -rcon Scaling the initial minimum size
with
1/0.8 (option -dds) = 1.25 Optimizing the DD grid for 12 cells with a
minimum initial size of 1.025 nm The maximum allowed number of cells is: X
2
Y 3 Z 2 Domain decomposition grid 2 x 3 x 2, separate PME nodes 0 Domain
decomposition nodeid 0, coordinates 0 0 0

Table routines are used for coulomb: FALSE
Table routines are used for vdw:     FALSE
Cut-off's:   NS: 1   Coulomb: 1   LJ: 1
System total charge: 1.000
Generated table with 1000 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1000 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1000 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Enabling SPC water optimization for 487 molecules.

Configuring nonbonded kernels...
Testing x86_64 SSE support... present.


Removing pbc first time

Initializing Parallel LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess
P-LINCS: A Parallel Linear Constraint Solver for molecular simulation J.
Chem. Theory Comput. 4 (2008) pp. 116-122
-------- -------- --- Thank You --- -------- --------

The number of constraints is 144
There are inter charge-group constraints, will communicate selected
coordinates each lincs iteration

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------


Linking all bonded interactions to atoms

The initial number of communication pulses is: X 1 Y 1 Z 1 The initial
domain decomposition cell size is: X 1.21 nm Y 1.05 nm Z 1.11 nm

The maximum allowed distance for charge groups involved in interactions
is:
                 non-bonded interactions           1.000 nm
            two-body bonded interactions  (-rdd)   1.000 nm
          multi-body bonded interactions  (-rdd)   1.000 nm
  atoms separated by up to 5 constraints  (-rcon)  1.054 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1 Y 2 Z 1 The minimum
size
for domain decomposition cells is 0.826 nm The requested allowed shrink of
DD cells (option -dds) is: 0.80 The allowed shrink of domain decomposition
cells is: X 0.82 Y 0.78 Z 0.90 The maximum allowed distance for charge
groups involved in interactions is:
                 non-bonded interactions           1.000 nm
            two-body bonded interactions  (-rdd)   1.000 nm
          multi-body bonded interactions  (-rdd)   0.826 nm
  atoms separated by up to 5 constraints  (-rcon)  0.826 nm


Making 3D domain decomposition grid 2 x 3 x 2, home cell index 0 0 0

Center of mass motion removal mode is Linear We have the following groups
for center of mass motion removal:
  0:  rest

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak Molecular
dynamics with coupling to an external bath J. Chem. Phys. 81 (1984) pp.
3684-3690
-------- -------- --- Thank You --- -------- --------

There are: 1604 Atoms
Charge group distribution at step 0: 45 50 45 42 46 41 44 45 41 47 51 47
Grid: 4 x 4 x 4 cells

Constraining the starting coordinates (step 0)

Constraining the coordinates at t0-dt (step 0) RMS relative constraint
deviation after constraining: 2.38e-05 Initial temperature: 299.151 K

Which is, of course, identical between the runs.

Thanks for any comments/ advice.

Jim

_______________________________________________
gmx-users mailing list    [email protected]
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected].
Can't post? Read http://www.gromacs.org/mailing_lists/users.php




--
-----------------------------------------------
Erik Marklund, PhD student
Laboratory of Molecular Biophysics,
Dept. of Cell and Molecular Biology, Uppsala University.
Husargatan 3, Box 596,    75124 Uppsala, Sweden
phone:    +46 18 471 4537        fax: +46 18 511 755
[email protected]    http://xray.bmc.uu.se/molbiophys

_______________________________________________
gmx-users mailing list    [email protected]
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected].
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to