Dimitar Pachov wrote:
Hello,
Just a quick update after a few shorts tests we (my colleague and I)
quickly did. First, using
"/You can emulate this yourself by calling "sleep 10s" before mdrun and
see if that's long enough to solve the latency issue in your case./"
doesn't work for a few reasons, mainly because it doesn't seem to be a
latency issue, but also because the load on a node is not affected by
"sleep".
However, you can reproduce the behavior I have observed pretty easily.
It seems to be related to the values of the pointers to the *xtc, *trr,
*edr, etc files written at the end of the checkpoint file after abrupt
crashes AND to the frequency of access (opening) to those files. How to
test:
1. In your input *mdp file put a high frequency of saving coordinates
to, say, the *xtc (10, for example) and a low frequency for the *trr
file (10,000, for example).
2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run)
3. Kill abruptly the run shortly after that (say, after 10-100 steps).
4. You should have a few frames written in the *xtc file, and the only
one (the first) in the *trr file. The *cpt file should have different
from zero values for "file_offset_low" for all of these files (the
pointers have been updated).
5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run).
6. Kill abruptly the run shortly after that (say, after 10-100 steps).
Pay attention that the frequency for accessing/writing the *trr has not
been reached.
7. You should have a few additional frames written in the *xtc file,
while the *trr will still have only 1 frame (the first). The *cpt file
now has updated all pointer values "file_offset_low", BUT the pointer to
the *trr has acquired a value of 0. Obviously, we already now what will
happen if we restart again from this last *cpt file.
8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run).
9. Kill it.
10. File *trr has size zero.
Therefore, if a run is killed before the files are accessed for writing
(depending on the chosen frequency), the file offset values reported in
the *cpt file doesn't seem to be accordingly updated, and hence a new
restart inevitably leads to overwritten output files.
Do you think this is fixable?
Perhaps, but it will require some more details. I cannot reproduce this
problem, and I wonder if it is compiler- or platform-specific. Can you please
provide:
1. Compiler (and version) used to build Gromacs
2. Hardware details
3. Command used to configure Gromacs
-Justin
--
========================================
Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
--
gmx-users mailing list [email protected]
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [email protected].
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists