On Apr 8, 2013 8:53 AM, "João Henriques" <joao.henriques.32...@gmail.com> wrote: > > Dear all, > > Due to cluster wall-time limitations, I was forced to restart two REMD > simulations. It ran absolutely fine until hitting the wall-time. To restart > I used the following command: > > mpirun -np 64 -output-filename MPIoutput $GromDir/mdrun_mpi -s H5_.tpr > -multi 64 -replex 1000 -deffnm H5_ -cpi -noappend > > (I'm using GMX-4.0.7 and yes I know it's old but I have my own reasons for > using it.) > > Here is a random replica (#1) MPI output: > > ######START####### > NNODES=64, MYRANK=1, HOSTNAME=an091 > NODEID=1 argc=11 > Checkpoint file is from part 1, new output files will be suffixed part0002. > Reading file H5_1.tpr, VERSION 4.0.7 (single precision) > > Reading checkpoint file H5_1.cpt generated: Wed Apr 3 17:13:14 2013 > > ------------------------------------------------------- > Program mdrun_mpi, VERSION 4.0.7 > Source code file: main.c, line: 116 > > Fatal error: > The 64 subsystems are not compatible > > ------------------------------------------------------- > > Error on node 1, will try to stop all the nodes > Halting parallel program mdrun_mpi on CPU 1 out of 64 > ######END####### > > It's reading from the correct cpt and tpr files, so it must be something > else. > > Here is a tail of the respective log file: > > ######START####### > Initializing Replica Exchange > Repl There are 64 replicas: > Multi-checking the number of atoms ... OK > Multi-checking the integrator ... OK > Multi-checking init_step+nsteps ... OK > Multi-checking first exchange step: init_step/-replex ... > first exchange step: init_step/-replex is not equal for all subsystems > subsystem 0: 3062 > subsystem 1: 3062 > subsystem 2: 3062 > subsystem 3: 3062 > subsystem 4: 3062 > subsystem 5: 3062 > subsystem 6: 3062 > subsystem 7: 3062 > subsystem 8: 3062 > subsystem 9: 3062 > subsystem 10: 3062 > subsystem 11: 3062 > subsystem 12: 3062 > subsystem 13: 3062 > subsystem 14: 3062 > subsystem 15: 3062 > subsystem 16: 3062 > subsystem 17: 3062 > subsystem 18: 3062 > subsystem 19: 3062 > subsystem 20: 3062 > subsystem 21: 3062 > subsystem 22: 3062 > subsystem 23: 3062 > subsystem 24: 3062 > subsystem 25: 3062 > subsystem 26: 3062 > subsystem 27: 3062 > subsystem 28: 3062 > subsystem 29: 3062 > subsystem 30: 3062 > subsystem 31: 3062 > subsystem 32: 3062 > subsystem 33: 3062 > subsystem 34: 3062 > subsystem 35: 3062 > subsystem 36: 3062 > subsystem 37: 3062 > subsystem 38: 3062 > subsystem 39: 3066
Seems system 39 got its IO done faster. Its state_prev.cpt will be 3062. Back up your files. Use gmxcheck to see what's in files. Rename as suitable so your set of files is consistent. Mark > subsystem 40: 3062 > subsystem 41: 3062 > subsystem 42: 3062 > subsystem 43: 3062 > subsystem 44: 3062 > subsystem 45: 3062 > subsystem 46: 3062 > subsystem 47: 3062 > subsystem 48: 3062 > subsystem 49: 3062 > subsystem 50: 3062 > subsystem 51: 3062 > subsystem 52: 3062 > subsystem 53: 3062 > subsystem 54: 3062 > subsystem 55: 3062 > subsystem 56: 3062 > subsystem 57: 3062 > subsystem 58: 3062 > subsystem 59: 3062 > subsystem 60: 3062 > subsystem 61: 3062 > subsystem 62: 3062 > subsystem 63: 3062 > > ------------------------------------------------------- > Program mdrun_mpi, VERSION 4.0.7 > Source code file: main.c, line: 116 > > Fatal error: > The 64 subsystems are not compatible > > ------------------------------------------------------- > ######END####### > > It's clear that "init_step/-replex is not equal for all subsystems" is the > problem, but does anyone know why this is happening and how to solve it? > > Thank you for your patience, > Best regards, > > João Henriques > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists