On 4/06/2011 8:26 AM, Dimitar Pachov wrote:

At first, I thought the -append option of the mdrun command was great. However, I don't think it is anymore and have actually started questioning myself why it exists at the first place, and second, why has it become the default option in the newest versions?

It exists because it used to be a pain to manage your simulation file numbering.

It is useless unless you run your simulations in a 100% safe from any unexpected problems (hardware, restarts, etc) mode, which is never the case. It is beyond me how such an option can become the default and how a statement like this:

"By default the output will be appending to the existing output files. The checkpoint file contains checksums of all output files, such that *you will never loose data when some output files are modified, corrupt or removed.*"

can be claimed without testing ALL of the scenarios that can lead to problems, that is, lost data.

The checkpoint file records the position of the output file pointers at the time of the checkpoint, along with an MD5 checksum. Upon restarting with -append, mdrun seeks to that file pointer position, verifies the checksum and issues a fatal error if this is not possible. So if checkpoint and other files are not altered or removed after a crash, then the method seems pretty safe to me.

The above text mentions you are safe even if you remove files - that's an overstatement. However, I can't see that removing a non-checkpoint file could lead to loss of useful data from other non-checkpoint files.

If one uses that option and the run is restarted and is again restarted before reaching the point of attempting to write a file, then things are lost,

If this is true, then it wants fixing, and fast, and will get it :-) However, it would be surprising for such a problem to exist and not have been reported up to now. This feature has been in the code for a year now, and while some minor issues have been fixed since the 4.5 release, it would surprise me greatly if your claim was true.

You're saying the equivalent of the steps below can occur:
1. Simulation wanders along normally and writes a checkpoint at step 1003
2. Random crash happens at step 1106
3. An -append restart from the old .tpr and the recent .cpt file will restart from step 1003
4. Random crash happens at step 1059
5. Now a restart doesn't restart from step 1003, but some other step

and most importantly, the most important piece of data, that being the trajectory file, could be completely lost! I don't know the code behind the checkpointing & appending, but I can see how easy one can overwrite 100ns trajectories, for example, and "obtain" the same trajectories of size .... 0.

I don't see how easy that is, without a concrete example, where user error is not possible.
Using the checkpoint capability & appending make sense when many restarts are expected, but unfortunately it is exactly then when these options completely fail! As a new user of Gromacs, I must say I am disappointed, and would like to obtain an explanation of why the usage of these options is clearly stated to be safe when it is not, and why the append option is the default, and why at least a single warning has not been posted anywhere in the docs & manuals?

I can understand and sympathize with your frustration if you've experienced the loss of a simulation. Do be careful when suggesting that others' actions are blame-worthy, however. The developers all act in good faith on a largely volunteer basis. Errors in coding do happen, and they do get attention as developers' time permits. However, developers' time rarely permits addressing "feature X doesn't work, why not?" in a productive way. Solving bugs can be hard, but will be easier (and solved faster!) if the user who thinks a problem exists follows good procedure. See http://www.chiark.greenend.org.uk/~sgtatham/bugs.html <http://www.chiark.greenend.org.uk/%7Esgtatham/bugs.html>

Mark
-- 
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to