On Tue, Jan 20, 2015 at 11:48 PM, Jiaqi Lin <jq...@mit.edu> wrote:
Hi Szilard,
- I've tired 5.0.1 and it gives the same result. So 4.6.7 or 5.0.4 is
better, but in what way?
- I've tired Verlet scheme and it gives small change of cutoff and grid.
But what I really interested is to manually reproduce the result that
tune_pme give me in the first case using Group scheme.
If that's the only situation you can observe it, it could be an outlier or
your method could be unsuitable...
- I've also tired lincs-roder=4 and it makes no difference.
- My Fourier spacing is 25% finer, but that shouldn't affect the results
right? if it do affect the results, then I want to find out how.
It affects your results because you could do some more sampling with the
computer time you are spending on PME at the moment. Where to choose your
balance of model quality vs amount of sampling is poorly understood, and
problem-dependent.
- I happen to use the PP/PME rank split (100+28) and it gives me
interesting results (speed of performance is not bad actually). Then I'm
very interested in how these cutoff and grid setting can affect my
simulation results.
If the implementation and model is right, then they have no effect. That's
why we auto-tune with them. You're going to need a lot of replicates to
show that -npme 28 gives a statistically different result from a different
value, and you won't yet have established that it's somehow a more valid
observation to pursue.
So I tried to manually control the parameter (turning off tune_PME). But no
matter how I tired, I can not reproduce the result given by tune_pme. So my
biggest question is, how does tune_PME implemented in the code? What
parameters does it actually tuned?
Like it says, it varies rcoulomb and the Fourier grid, keeping rvdw and
beta fixed. Details of how rlist and rlistlong behave are a bit messy, but
nothing you asked for is ever missed out.
- When PME tuned the cutoff to such large value, the speed does not goes
down noticeably. So what I suspect is that tune_PME
Please be careful with names. gmx tune_pme is a different thing from the
mdrun auto-tuning.
does the direct space calculation without changing the neighbor list
search distance.
Your auto-tuned run number 1 had rlist = rcoulomb at the start, so mdrun
knows you wanted a PME model with an unbuffered list whose size equals
rcoulomb, and a buffered VDW model with rlist 1.4 and rvdw 1.2. Thus,
rlistlong will stay equal to rcoulomb as it changes. The details and code
are horrible, though, and I am looking forward to nothing so much as
ripping it all out in about 2 months!
And like Szilard suggested, your runs are probably a long way from maximum
throughput. Aim for lots of sampling, don't chase replicating rare events
with brute-force simulation!
Mark
Thank you
Best
Jiaqi
On 1/20/2015 3:54 PM, Szilárd Páll wrote:
Not (all) directly related, but a few comments/questions:
- Have you tried 4.6.7 or 5.0.4?
- Have you considered using the Verlet scheme instead of doing manual
buffering?
- lincs-order=8 is very large for 2fs production runs - typically 4 is
used.
- Your fourier spacing is a lot (~25%) finer than it needs to be.
- The PP/PME rank split of 100+28 is _very_ inconvenient and it is the
main cause of the horrible PME performance together with the overly
coarse grid. That's why you get such a huge cut-off after the PP-PME
load balancing. Even if you want to stick to these parameters, you
should tune the rank split (manually or with tune_pme).
- The above contributes to the high neighbor search cost too.
--
Szilárd
On Tue, Jan 20, 2015 at 9:18 PM, Jiaqi Lin <jq...@mit.edu> wrote:
Hi Mark,
Thanks for reply. I put the md.log files in the following link
https://www.dropbox.com/sh/d1d2fbwreizr974/
AABYhSRU03nmijbTIXKKr-rra?dl=0
There are four log files
1.GMX 4.6.5 -tunepme (the coulombic cutoff is tuned to 3.253)
2.GMX 4.6.5 -notunepme rcoulomb= 3.3 , fourierspace = 0.33
3.GMX 4.6.5 -notunepme rcoulomb= 3.3 , fourierspace = 0.14
4.GMX 4.6.5 -notunepme rcoulomb= 1.4 , fourierspace = 0.14
Note that the LR Coulombic energy in the first one is almost twice the
value
of that in the second one, whereas the grid spacing in both cases are
nealy
the same.
Only the first one gives a strong electrostatic interaction of a
nanoparticle with a lipid bilayer under ionic imbalance. In other cases
I do
not observe such a strong interaction.
GMX 5.0.1 give the same results as GMX 4.6.5 using Group cutoff. Thanks
Regards
Jiaqi
On 1/19/2015 3:22 PM, Mark Abraham wrote:
On Thu, Jan 15, 2015 at 3:21 AM, Jiaqi Lin <jq...@mit.edu> wrote:
Dear GMX developers,
I've encounter a problem in GROMACS concerning the auto-tuning feature
of
PME that bugged me for months. As stated in the title, the auto-tuning
feature of mdrun changed my coulomb cutoff from 1.4 nm to ~3.3 nm
(stated
in md.log) when I set -npme to be 28 (128 total CPU cores), and this
giving
me interesting simulation results. When I use -notunepme, I found
Coulomb
(SR) and recip. giving me same energy but the actual simulation result
is
different. This i can understand: scaling between coulombic
cut-off/grid
size theoretically give same accuracy to electrostatics (according to
GMX
manual and PME papers), but there actually some numerical error due to
grid
mapping and even if the energy is the same that does not mean system
configuration has to be the same (NVE ensemble: constant energy,
different
configuration).
Total electrostatic energy should be approximately the same with
different
PME partitions.
However the thing i don't understand is the following. I am interested
in
the result under large coulomb cut-off, so I try to manually set
cut-off
and grid space with -notunepme, using the value tuned by mdrun
previously.
This give me complete different simulation result, and the energy is
also
different. I've tried to set rlist, rlistlong, or both to equal
rcoulomb
(~3.3) still does not give me the result produced by auto-tuning PME.
In what sense is the result different?
In addition, simulation speed dramatically reduces when I set rcoulomb
to
be ~3.3 (using -tunepme the speed remains nearly the same no matter how
large the cutoff is tuned to). I've tested this in both GMX 4.6.5 and
5.0.1, same thing happens, so clearly it's not because of versions.
Thus
the question is: what exactly happened to PME calcualtion using the
auto-tuning feature in mdrun, why it does give different results when I
manually set the coulomb cutoff and grid space to the value tuned by
mdrun
without the auto-tuning feature (using -notunepme)? Thank you for help.
For the group scheme, these should all lead to essentially the same
result
and (if tuned) performance. If you can share your various log files on a
file-sharing service (rc 1.4, rc 3.3, various -tunepme settings, 4.6.5
and
5.0.1) then we can be in a position to comment further.
Mark
additional info: I use Group cutoff-scheme , rvdw is 1.2.
md.log file:
DD step 9 load imb.: force 29.4% pme mesh/force 3.627
step 30: timed with pme grid 280 280 384, coulomb cutoff 1.400:
1026.4
M-cycles
step 50: timed with pme grid 256 256 324, coulomb cutoff 1.464: 850.3
M-cycles
step 70: timed with pme grid 224 224 300, coulomb cutoff 1.626: 603.6
M-cycles
step 90: timed with pme grid 200 200 280, coulomb cutoff 1.822: 555.2
M-cycles
step 110: timed with pme grid 160 160 208, coulomb cutoff 2.280: 397.0
M-cycles
step 130: timed with pme grid 144 144 192, coulomb cutoff 2.530: 376.0
M-cycles
step 150: timed with pme grid 128 128 160, coulomb cutoff 2.964: 343.7
M-cycles
step 170: timed with pme grid 112 112 144, coulomb cutoff 3.294: 334.8
M-cycles
Grid: 12 x 14 x 14 cells
step 190: timed with pme grid 84 84 108, coulomb cutoff 4.392: 346.2
M-cycles
step 190: the PME grid restriction limits the PME load balancing to a
coulomb cut-off of 4.392
step 210: timed with pme grid 128 128 192, coulomb cutoff 2.846: 360.6
M-cycles
step 230: timed with pme grid 128 128 160, coulomb cutoff 2.964: 343.6
M-cycles
step 250: timed with pme grid 120 120 160, coulomb cutoff 3.036: 340.4
M-cycles
step 270: timed with pme grid 112 112 160, coulomb cutoff 3.253: 334.3
M-cycles
step 290: timed with pme grid 112 112 144, coulomb cutoff 3.294: 334.7
M-cycles
step 310: timed with pme grid 84 84 108, coulomb cutoff 4.392: 348.0
M-cycles
optimal pme grid 112 112 160, coulomb cutoff 3.253
DD step 999 load imb.: force 18.4% pme mesh/force 0.918
At step 1000 the performance loss due to force load imbalance is 6.3 %
NOTE: Turning on dynamic load balancing
Step Time Lambda
1000 20.00000 0.00000
Energies (kJ/mol)
Bond G96Angle LJ (SR) Coulomb (SR) Coul.
recip.
1.98359e+05 1.79181e+06 -1.08927e+07 -7.04736e+06
-2.32682e+05
Position Rest. Potential Kinetic En. Total Energy
Temperature
6.20627e+04 -1.61205e+07 4.34624e+06 -1.17743e+07
3.00659e+02
Pressure (bar) Constr. rmsd
2.13582e+00 1.74243e-04
Best
Jiaqi
--
Jiaqi Lin
postdoc fellow
The Langer Lab
--
Gromacs Users mailing list
* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.
--
Jiaqi Lin
postdoc fellow
The Langer Lab
--
Gromacs Users mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a
mail to gmx-users-requ...@gromacs.org.
--
Jiaqi Lin
postdoc fellow
The Langer Lab
David H. Koch Institute for Integrative Cancer Research
Massachusetts Institute of Technology
--
Gromacs Users mailing list
* Please search the archive at http://www.gromacs.org/
Support/Mailing_Lists/GMX-Users_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
send a mail to gmx-users-requ...@gromacs.org.