Message: 4 Date: Wed, 05 Dec 2007 14:19:28 +0100 From: "Berk Hess" <[EMAIL PROTECTED]> Subject: Re: [gmx-users] different results when using different number cpus To: [email protected] Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; format=flowedHi, With Gromacs and (nearly) all other MD packages you will never be able to get binary identical results when running on different number of CPUs. Since MD is chaotic, the results can be very different. Berk.
I can confirm that I get the same thing when running a repeat of a simulation segment twice on 4 cpus with gromacs-3.3.1 and fftw-3.1.2. Further, while trying to debug a collegues parameters that give a lincs error after long periods of simulation time on a single processor I find that a proper restart from just prior to the crash does not lead to an exact repeat of the error (although an error does eventually occur). This was unfortunate since my plan was to save the .trr every 100ps and then do a restart in which I saved the .xtc every integration step to get a good look at the problem. Carsten's comments about fftw3.x is useful since I have been using fftw-3.1.2. Note that I did not test to see if a run on 1cpu will generate an identical trajectory, only that the lincs error is not exactly reproduced. I did the restart using .trr/.edr and set gen_vel=no;unconstrained_start=yes; for the restart.
I agree that statistical properties will be properly reproduced, but I can imagine situations in which a proper restart would be identical: e.g. an interest in the dynamics of quick rare processes in which one might run for a long time while saving .xtc and .trr infrequently and then restarting at the proper place while saving .xtc very frequently in order to capture the dynamics of an identified transition.
From: Carsten Kutzner <[EMAIL PROTECTED]> Reply-To: Discussion list for GROMACS users <[email protected]> To: Discussion list for GROMACS users <[email protected]> Subject: Re: [gmx-users] different results when using different number cpus Date: Wed, 05 Dec 2007 14:10:06 +0100 Hi Dechang, it is normal that results are not binary identical if you compare the "same" MD system on different numbers of processors. If you use PME then you will probably get slightly different charge grids for 2 and for 16 processors - since the charge grid has to be divisible by the number of CPUs in x- and y-direction. Even if you manually set the grid dimensions to be the same for both cases, your simulations could diverge when using version 3.x of the FFTW. This version has a build-in timer and chooses the fastest of several algorithms which could be another even in two runs on the same number of processors - depending on the timing results. With different algorithms you get slight differences in the last digit of the computed numbers (rounding / truncation / order of evaluation) which will then grow during the simulation and lead to diverging trajectories. Of course the averaged properties of the simulation are unaffected by those differences and should be the same if averaged long enough. You could use FFTW 2.x and manually set the FFT grid size to the same value for the 2 and 16 CPU case - but I am not shure if this is enough to get binary identical results. You could also repeat your simulations several times with (slightly) different starting conditions (maybe different starting velocities) to get a better picture of the average behaviour of your system. If in all 16 processor cases you see the proteins diverge and in all 2 processor cases you see them converge, I would guess something is wrong. Hope that helps, Carsten Dechang Li wrote: > Dear all, > > ¡¡¡¡I used Gromacs3.3.1 to do a simulation about two proteins in water(tip3p). > I run two similar simulations, one for 2 cpus, while the other for 16 cpus. > The two simulations have the same .gro, .top, and the same .mdp files. I found > the results were not the same. In the 2 cpus simulation, the two proteins > run closer and closer. But they run apart in the 16 cpus simulation. > Is that normal the different results when using different number cpus? The > size of my simulation box is 9*7*7. > > > > > > > > Best regards, > > 2007-12-5 > ¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ > > ========================================= > Dechang Li, PhD Candidate > Department of Engineering Mechanics > Tsinghua University > Beijing 100084 > PR China > > Tel: +86-10-62773779(O) > Email: [EMAIL PROTECTED] > =========================================¡¡¡¡¡¡¡¡¡¡ >
_______________________________________________ gmx-users mailing list [email protected] http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php

