Mark Abraham wrote:
OK I have some confirmation of a possible bug here. Using 4.0.4 to do
reruns on the same positions-only NPT peptide+water trajectory with the
same run input file:
a) compiled without MPI, a single-processor rerun worked correctly,
including "zero" KE and temperature at each frame
b) compiled with MPI, a single-processor run worked correctly, including
zero KE and temperature, and agreed with a) within machine precision
c) compiled with MPI, a 4-processor run worked incorrectly : an
approximately-correct temperature and plausible positive KE were
reported, all PE terms were identical to about machine precision with
the first step of a) and b), and the reported pressure was different.
Thus it seems that a multi-processor mdrun is not updating the structure
for subsequent steps in the loop over structures, and/or is getting some
KE from somewhere that a single-processor calculation is not.
I'll step through c) with a debugger tomorrow.
d) compiled with MPI, a 4-processor run using particle decomposition
worked correctly, agreeing with a).
Further, c) has the *same* plausible positive KE at each step.
From stepping through a run, I think the rerun DD problem arises in
that a rerun loads the data from the rerun trajectory into rerun_fr, and
later copies those into state, and not into state_global. state_global
is initialized to that of the .tpr file (which *has* velocities), which
is used for the DD initialization, and state_global is never
subsequently updated. So, for each rerun step, the same .tpr state gets
propagated, which leads to all the symptoms I describe above. The KE
comes from the velocities in the .tpr file, and is thus constant.
So, a preliminary work-around is to use mdrun -rerun -pd to get particle
decomposition.
I tried to hack a fix for the DD code. It seemed that using
for (i=0; i<state_global->natoms; i++)
copy_rvec(rerun_fr.x[i],state_global.x[i])
before about line 1060 of do_md() in src/kernel/md.c should do the
trick, since with bMasterState set for a rerun, dd_partition_system()
should propagate state_global to the right places. However I got a
segfault in that copy_rvec with i==0, despite state_global.x being
allocated and of the right dimensions according to Totalview's memory
debugger.
I'll file a bugzilla in any case.
Mark
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php