No obvious problems. Please open an issue at redmine.gromacs.org when you have something reproducible, but don't hurry, nobody's likely to have time to check it out for a week or two.
Cheers, Mark On Sep 9, 2013 5:11 PM, "Richard Broadbent" < richard.broadben...@imperial.ac.uk> wrote: > Hi Mark, > > Thanks for the quick response, > > On 09/09/13 15:45, Mark Abraham wrote: > >> Sounds worrying :-( Thanks for the detailed report and >> trouble-shooting! So far, I can't think of a reason for it. >> >> A couple of suggestions: >> * try again with 4.6.3 (at least while trouble-shooting) in case its a >> fixed bug >> > I'll test that side by side with 4.6.1 that way we can have both for > comparison > >> * post a representative .mdp file >> > its below this message the production run is built using tpbconv -extend > on the .tpr built from that .mdp. > > * is there anything out of the ordinary in the topology? >> > I built the residues myself but they're just standard polymer monomer > units nothing out of the ordinary. > > * if the problem is restart-related and shows up in the drift quickly, >> then you can probably find a reproducible case via a job that does >> lots of short-interval restarts and saves all the intermediate files - >> a (set of) inputs that can reproduce the problem sounds like what we'd >> need to diagnose and/or fix anything >> > I'm already starting to build them will be testing them tomorrow > >> * does it happen in a non-multi simulation? (or more particularly, >> what are you doing with -multi?) >> > The -multi was used to move the job into a faster queue I've seen it in > non -multi jobs > >> * check .log files for warnings, and that there are none being >> suppressed at the grompp stage >> > There are no errors at grompp stage > I haven't identified any warnings in the mdrun logs but I'm going to have > a another look before I'm 100% certain that there aren't any in there but I > couldn't see any on a first look through > > * see if the group cut-off scheme in 4.6.x shows the same problem >> >> Will do > > Mark >> > > > Thanks, > > Richard > > > integrator = md > bd_fric = 0 > > dt = 0.002 > > nsteps = 2500000 > > comm_mode = linear > > nstxout = 100000 > nstvout = 100000 > nstfout = 0 > > xtc_grps = P84 > nstxtcout = 50000 > > nstlog = 100000 > > nstenergy = 50000 > > pbc = xyz > periodic_molecules = no > > ns_type = grid > nstlist = 10 > > rlist = 1.25 > optimize_fft = yes > fourier_nx = 128 > fourier_ny = 128 > fourier_nz = 128 > > pme_order = 4 > epsilon_r = 1.0 > > coulombtype = pme > coulomb-modifier = Potential-shift-Verlet > rcoulomb = 1.2 > > vdwtype = cut-off > vdw-modifier = Potential-shift-Verlet > > rvdw = 1.20 > > DispCorr = EnerPres > > tcoupl = no > > nsttcouple = 5 > > pcoupl = no > > constraints = h-bonds > > lincs_order = 6 > lincs_iter = 2 > > cutoff-scheme = Verlet > verlet-buffer-drift = -1 > > > >> >> On Mon, Sep 9, 2013 at 4:08 PM, Richard Broadbent >> <richard.broadbent09@imperial.**ac.uk<richard.broadben...@imperial.ac.uk>> >> wrote: >> >>> Dear All, >>> >>> I've been analysing a series of long (200 ns) NVE simulations (md >>> integrator) on ~93'000 atom systems I ran the simulations in groups of 3 >>> using the -multi option in gromacs v4.6.1 double precision. >>> >>> Simulations were run with 1 OpenMP thread per MPI process >>> >>> The simulations were restarted at regular intervals using the following >>> submission script: >>> >>> >>> FILE=4.6_P84_DIO_ >>> >>> module load fftw xe-gromacs/4.6.1 >>> >>> # Change to the direcotry that the job was submitted from >>> cd $PBS_O_WORKDIR >>> >>> export NPROC=`qstat -f $PBS_JOBID | grep mppwidth | awk '{print $3}'` >>> export NTASK=`qstat -f $PBS_JOBID | grep mppnppn | awk '{print $3}'` >>> >>> aprun -n $NPROC -N $NTASK mdrun_mpi_d -deffnm $FILE -maxh 24 -multi 3 >>> -npme 64 -append -cpi >>> >>> >>> >>> ### >>> >>> The first simulation was run with the same script except the mdrun line >>> was >>> >>> aprun -n $NPROC -N $NTASK mdrun_mpi_d -deffnm $FILE -maxh 24 -multi 3 >>> -npme 64 >>> >>> ### >>> >>> >>> The simulations generally ran and restarted without trouble, however, in >>> several simulations the energy drift changed radically following the >>> restart. >>> >>> in one simulation the simulation ran for 50 ns (including one restart) >>> with >>> a drift of -141.6 +/- 0.1 kJ mol^-1 ns^1 >>> it was restarted then had a drift of +104 +/- 1 kJ mol^-1 ns^1 for 15 ns >>> then was restarted and continued with a drift of -138 +/- 0.1 kJ mol^-1 >>> ns^1 >>> for a further 50~ns. >>> >>> The other 2 simulations running in parallel with this calculation through >>> the -multi option did not experience a change in gradient. >>> >>> the drifts were calculated by least squares analysis of the output from >>> the >>> total energy data given by >>> >>> echo "total" | g_energy_d -f ${FILE}${i}.edr -o total_${FILE}${i}.xvg >>> -xvg >>> none >>> >>> >>> The simulation writes to the edr every 20 ps and the transition is >>> masked by >>> the expected oscillations in energy due to the integrator on a 2~ns >>> interval >>> but the change in drift is clear when looking at a 4~ns range centred on >>> the >>> restart. >>> >>> The hardware used was of the same specification for all jobs (27 cray XE6 >>> nodes (9 nodes per simulation), 32 mpi processes per node) >>> >>> The simulations use the verlet cut-off scheme >>> there are H-bond constraints enforced using lincs (order 6, iterations 2) >>> >>> >>> I can't think what would cause this change in the drift during a restart. >>> However, I have seen it in simulations run on both an AMD system (cray >>> XE6, >>> AVX-FMA) and an intel system (SGI-ice, SSE4.1). >>> >>> >>> I have some data generated using the same procedure using v4.5.5 and >>> v4.5.7 >>> (different cut-off scheme) and the restarts in that system have not >>> caused >>> any appreciable changes in the simulation. >>> >>> Unfortunately I didn't save the checkpoint files used for the restart (I >>> will in the future). I'm going to try building a new input file from just >>> before the restart using the trr trajectory data. >>> >>> >>> Does anyone have any ideas of what might have caused this? >>> >>> Has anyone seen similar effects? >>> >>> Thanks, >>> >>> Richard >>> -- >>> gmx-users mailing list gmx-users@gromacs.org >>> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users> >>> * Please search the archive at >>> http://www.gromacs.org/**Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before >>> posting! >>> * Please don't post (un)subscribe requests to the list. Use the www >>> interface or send it to gmx-users-requ...@gromacs.org. >>> * Can't post? Read >>> http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists> >>> >> -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users> > * Please search the archive at http://www.gromacs.org/** > Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before > posting! > * Please don't post (un)subscribe requests to the list. Use the www > interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read > http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists> > -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists