On 5/06/2011 11:08 PM, Francesco Oteri wrote:
Dear Dimitar,
I'm following the debate regarding:
The point was not "why" I was getting the restarts, but the fact
itself that I was getting restarts close in time, as I stated in my
first post. I actually also don't know whether jobs are deleted or
suspended. I've thought that a job returned back to the queue will
basically start from the beginning when later moved to an empty slot
... so don't understand the difference from that perspective.
In the second mail yoo say:
Submitted by:
========================
ii=1
ifmpi="mpirun -np $NSLOTS"
--------
if [ ! -f run${ii}-i.tpr ];then
cp run${ii}.tpr run${ii}-i.tpr
tpbconv -s run${ii}-i.tpr -until 200000 -o run${ii}.tpr
fi
k=`ls md-${ii}*.out | wc -l`
outfile="md-${ii}-$k.out"
if [[ -f run${ii}.cpt ]]; then
* $ifmpi `which mdrun` *-s run${ii}.tpr -cpi run${ii}.cpt -v -deffnm
run${ii} -npme 0 > $outfile 2>&1
fi
=========================
If I understand well, you are submitting the SERIAL mdrun. This means
that multiple instances of mdrun are running at the same time.
Each instance of mdrun is an INDIPENDENT instance. Therefore
checkpoint files, one for each instance (i.e. one for each CPU), are
written at the same time.
Good thought, but Dimitar's stdout excerpts from early in the thread do
indicate the presence of multiple execution threads. Dynamic load
balancing gets turned on, and the DD is 4x2x1 for his 8 processors.
Conventionally, and by default in the installation process, the
MPI-enabled binaries get an "_mpi" suffix, but it isn't enforced - or
enforceable :-)
Mark
--
gmx-users mailing list [email protected]
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [email protected].
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists