Dear Gromacs users,

I tried to do a REMD simulation with gromacs 5.1 which is re-launched every hour (in a queuing system) with the -maxh option. The first time it was launched, it worked : the run stoped at the maxh time and it was re-launched with the checkpoint files and continued the simulation. But during this second run, when the maxh time was achieved (step 1981062), gromacs said that it was going to stop but it did not stop until the system kill the job (step 2545600) .

I tried with different maxh times ( 1/0.95/0.20 hour) to be sure that the time between maxh and the cluster maxtime was sufficient, but in any case the second run continued until it reached the one hour and was killed by the system.

I find this very strange that it works the first time and that the second time gromacs says that it has to stop but does not. Moreover, I tried the same work but with a classical simulation (without REMD) and this time there was no problem. Did I forget an option or something like that for maxh being compatible with the REMD ? I searched on the web and the mailing list but I did not find any recording problems between maxh and REMD.

Do you have any idea of what the problem is ?

Here is the command lines in my script myJob.slurm :
---------------------
srun --mpi=pmi2 -K1 --resv-ports -n $SLURM_NTASKS mdrun_mpi -ntomp 1 -multi 8 -replex 500 -maxh 0.2 -deffnm mdA_ -cpi mdA_.cpt -cpo mdA_.cpt -v 2>> remdA.log
# resubmit the same job at the end for a long run:
sbatch myJob.slurm
---------------------

Here is a part of my remdA.log file :
---------------------
starting mdrun 'myPeptide'
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
starting mdrun 'myPeptide'
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).
120000000 steps, 240000.0 ps (continuing from step 655701,   1311.4 ps).

Step 1981061: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

Step 1981062: Run time exceeded 0.198 hours, will terminate the run

step 1981100, will finish Sat Mar 26 11:14:40 2016
step 1981200, will finish Sat Mar 26 11:14:40 2016
...
step 2545600, will finish Sat Mar 26 11:15:49 2016srun: Job step aborted: Waiting up to 32 seconds for job step to finish.

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step

Received the TERM signal, stopping at the next NS step
---------------------

Thanks a lot,

Maud





--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to