Dimitar Pachov wrote:
Hello,

Just a quick update after a few shorts tests we (my colleague and I) quickly did. First, using "/You can emulate this yourself by calling "sleep 10s" before mdrun and see if that's long enough to solve the latency issue in your case./"

doesn't work for a few reasons, mainly because it doesn't seem to be a latency issue, but also because the load on a node is not affected by "sleep".

However, you can reproduce the behavior I have observed pretty easily. It seems to be related to the values of the pointers to the *xtc, *trr, *edr, etc files written at the end of the checkpoint file after abrupt crashes AND to the frequency of access (opening) to those files. How to test: 1. In your input *mdp file put a high frequency of saving coordinates to, say, the *xtc (10, for example) and a low frequency for the *trr file (10,000, for example).
2. Run GROMACS (mdrun -s run.tpr -v -cpi -deffnm run)
3. Kill abruptly the run shortly after that (say, after 10-100 steps).
4. You should have a few frames written in the *xtc file, and the only one (the first) in the *trr file. The *cpt file should have different from zero values for "file_offset_low" for all of these files (the pointers have been updated).

5. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 6. Kill abruptly the run shortly after that (say, after 10-100 steps). Pay attention that the frequency for accessing/writing the *trr has not been reached. 7. You should have a few additional frames written in the *xtc file, while the *trr will still have only 1 frame (the first). The *cpt file now has updated all pointer values "file_offset_low", BUT the pointer to the *trr has acquired a value of 0. Obviously, we already now what will happen if we restart again from this last *cpt file. 8. Restart GROMACS (mdrun -s run.tpr -v -cpi -deffnm run). 9. Kill it. 10. File *trr has size zero.

Therefore, if a run is killed before the files are accessed for writing (depending on the chosen frequency), the file offset values reported in the *cpt file doesn't seem to be accordingly updated, and hence a new restart inevitably leads to overwritten output files. Do you think this is fixable?


Perhaps, but it will require some more details. I cannot reproduce this problem, and I wonder if it is compiler- or platform-specific. Can you please provide:

1. Compiler (and version) used to build Gromacs
2. Hardware details
3. Command used to configure Gromacs

-Justin

--
========================================

Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
MILES-IGERT Trainee
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin

========================================
--
gmx-users mailing list    [email protected]
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected].
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to