On 4/06/2011 8:26 AM, Dimitar Pachov wrote:
At first, I thought the -append option of the mdrun command was great.
However, I don't think it is anymore and have actually started
questioning myself why it exists at the first place, and second, why
has it become the default option in the newest versions?
It exists because it used to be a pain to manage your simulation file
numbering.
It is useless unless you run your simulations in a 100% safe from any
unexpected problems (hardware, restarts, etc) mode, which is never the
case. It is beyond me how such an option can become the default and
how a statement like this:
"By default the output will be appending to the existing output files.
The checkpoint file contains checksums of all output files, such that
*you will never loose data when some output files are modified,
corrupt or removed.*"
can be claimed without testing ALL of the scenarios that can lead to
problems, that is, lost data.
The checkpoint file records the position of the output file pointers at
the time of the checkpoint, along with an MD5 checksum. Upon restarting
with -append, mdrun seeks to that file pointer position, verifies the
checksum and issues a fatal error if this is not possible. So if
checkpoint and other files are not altered or removed after a crash,
then the method seems pretty safe to me.
The above text mentions you are safe even if you remove files - that's
an overstatement. However, I can't see that removing a non-checkpoint
file could lead to loss of useful data from other non-checkpoint files.
If one uses that option and the run is restarted and is again
restarted before reaching the point of attempting to write a file,
then things are lost,
If this is true, then it wants fixing, and fast, and will get it :-)
However, it would be surprising for such a problem to exist and not have
been reported up to now. This feature has been in the code for a year
now, and while some minor issues have been fixed since the 4.5 release,
it would surprise me greatly if your claim was true.
You're saying the equivalent of the steps below can occur:
1. Simulation wanders along normally and writes a checkpoint at step 1003
2. Random crash happens at step 1106
3. An -append restart from the old .tpr and the recent .cpt file will
restart from step 1003
4. Random crash happens at step 1059
5. Now a restart doesn't restart from step 1003, but some other step
and most importantly, the most important piece of data, that being the
trajectory file, could be completely lost! I don't know the code
behind the checkpointing & appending, but I can see how easy one can
overwrite 100ns trajectories, for example, and "obtain" the same
trajectories of size .... 0.
I don't see how easy that is, without a concrete example, where user
error is not possible.
Using the checkpoint capability & appending make sense when many
restarts are expected, but unfortunately it is exactly then when these
options completely fail! As a new user of Gromacs, I must say I am
disappointed, and would like to obtain an explanation of why the usage
of these options is clearly stated to be safe when it is not, and why
the append option is the default, and why at least a single warning
has not been posted anywhere in the docs & manuals?
I can understand and sympathize with your frustration if you've
experienced the loss of a simulation. Do be careful when suggesting that
others' actions are blame-worthy, however. The developers all act in
good faith on a largely volunteer basis. Errors in coding do happen, and
they do get attention as developers' time permits. However, developers'
time rarely permits addressing "feature X doesn't work, why not?" in a
productive way. Solving bugs can be hard, but will be easier (and solved
faster!) if the user who thinks a problem exists follows good procedure.
See http://www.chiark.greenend.org.uk/~sgtatham/bugs.html
<http://www.chiark.greenend.org.uk/%7Esgtatham/bugs.html>
Mark
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists