gianluca santarossa wrote:
Mark Abraham wrote:

You can
a) prevent your simulations from crashing,

I can't. I run simulations on a cluster through a queue, and sometimes the jobs are longer than the max time of the queue.

Yes you can. Do a pilot run and look at the last few lines of the logfile - or better one of the crashed runs - you want reasonable length so your setup time is amortized over all of the timesteps. That will tell you how much simulation time you can do per unit wall clock time. Now adjust the number of simulation steps accordingly.

b) restart from the last frame common to both files, here 8, or

Ok, I want to do it automatically... Moreover, in this way it would be tricky to rebuild the trajectories and the energies.

Indeed, it isn't something you want to do all the time, so see the above solution :-)

c) if they simulations are crashing in response to a signal, use a less vigorous one and gromacs will catch it and exit gracefully, writing output.

This is how my job works now. If you have better ideas, I would be happy to try them... I submit to the queue a script running mdrun . The script just traps the signals SIGUSR2 (or, eventually, SIGINT) and copies the trajectories and the energies back to $SOMEWHERE.
The script looks like this:

bakup()
{
cp ener.edr traj.trr $SOMEWHERE
...
other stuff
...
}
trap backup SIGUSR2 SIGINT
mdrun > mdrun.log

At the end, if I try to restart the simulation with tpbconv, I sometimes find that, as you said, ener.edr was interrupted while writing. How can I modify the script to let mdrun exit normally?
As you can see, I catch SIGUSR2, not SIGKILL...

That's a reasonable start, but the nature of buffered output is such that you can't guarantee that ener.edr and traj.trr are at the same point. What you need to do is get gromacs to exit gracefully having flushed its buffers. My PBS setup sends a SIGHUP that GROMACS 3.3.1 reads and does an appropriate end-of-last-step flush and a pirouette to finish :-) I suggest passing the SIGHUP, delaying as long as you can afford and only then copying the files back. This will work better on average. It's probably overkill if you implement the first solution.

Mark
_______________________________________________
gmx-users mailing list    [email protected]
http://www.gromacs.org/mailman/listinfo/gmx-users
Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to