Message: 4
Date: Wed, 05 Dec 2007 14:19:28 +0100
From: "Berk Hess" <[EMAIL PROTECTED]>
Subject: Re: [gmx-users] different results when using different number
        cpus
To: [email protected]
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; format=flowed

Hi,

With Gromacs and (nearly) all other MD packages you will never be able
to get binary identical results when running on different number of CPUs.
Since MD is chaotic, the results can be very different.

Berk.

I can confirm that I get the same thing when running a repeat of a simulation segment twice on 4 cpus with gromacs-3.3.1 and fftw-3.1.2. Further, while trying to debug a collegues parameters that give a lincs error after long periods of simulation time on a single processor I find that a proper restart from just prior to the crash does not lead to an exact repeat of the error (although an error does eventually occur). This was unfortunate since my plan was to save the .trr every 100ps and then do a restart in which I saved the .xtc every integration step to get a good look at the problem. Carsten's comments about fftw3.x is useful since I have been using fftw-3.1.2. Note that I did not test to see if a run on 1cpu will generate an identical trajectory, only that the lincs error is not exactly reproduced. I did the restart using .trr/.edr and set gen_vel=no;unconstrained_start=yes; for the restart.

I agree that statistical properties will be properly reproduced, but I can imagine situations in which a proper restart would be identical: e.g. an interest in the dynamics of quick rare processes in which one might run for a long time while saving .xtc and .trr infrequently and then restarting at the proper place while saving .xtc very frequently in order to capture the dynamics of an identified transition.



From: Carsten Kutzner <[EMAIL PROTECTED]>
Reply-To: Discussion list for GROMACS users <[email protected]>
To: Discussion list for GROMACS users <[email protected]>
Subject: Re: [gmx-users] different results when using different number cpus
Date: Wed, 05 Dec 2007 14:10:06 +0100

Hi Dechang,

it is normal that results are not binary identical if you compare the
"same" MD system on different numbers of processors. If you use PME then
you will probably get slightly different charge grids for 2 and for 16
processors - since the charge grid has to be divisible by the number of
CPUs in x- and y-direction. Even if you manually set the grid dimensions
to be the same for both cases, your simulations could diverge when using
version 3.x of the FFTW. This version has a build-in timer and chooses
the fastest of several algorithms which could be another even in two
runs on the same number of processors - depending on the timing results.
With different algorithms you get slight differences in the last digit
of the computed numbers (rounding / truncation / order of evaluation)
which will then grow during the simulation and lead to diverging
trajectories. Of course the averaged properties of the simulation are
unaffected by those differences and should be the same if averaged long
enough.
You could use FFTW 2.x and manually set the FFT grid size to the same
value for the 2 and 16 CPU case - but I am not shure if this is enough
to get binary identical results.
You could also repeat your simulations several times with (slightly)
different starting conditions (maybe different starting velocities) to
get a better picture of the average behaviour of your system. If in all
16 processor cases you see the proteins diverge and in all 2 processor
cases you see them converge, I would guess something is wrong.

Hope that helps,
  Carsten


Dechang Li wrote:
>  Dear all,
>
> ¡¡¡¡I used Gromacs3.3.1 to do a simulation about two proteins in
water(tip3p).
> I run two similar simulations, one for 2 cpus, while the other for 16
cpus.
> The two simulations have the same .gro, .top, and the same .mdp files. I
found
> the results were not the same. In the 2 cpus simulation, the two
proteins
> run closer and closer. But they run apart in the 16 cpus simulation.
>    Is that normal the different results when using different number
cpus? The
> size of my simulation box is 9*7*7.
>
>
>
>
>
>
>
> Best regards,
>
> 2007-12-5
> ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡
>
> =========================================
> Dechang Li, PhD Candidate
> Department of Engineering Mechanics
> Tsinghua University
> Beijing 100084
> PR China
>
> Tel:   +86-10-62773779(O)
> Email: [EMAIL PROTECTED]
> =========================================¡¡¡¡¡¡¡¡¡¡
>


_______________________________________________
gmx-users mailing list    [email protected]
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to