I am using 3.3.2 and 3.3.1 and I get the following problem with both of them.
If I run replica exchange on >4 processors (2 and 4 are fine), the simulations finish, but mpi gives the following errors, thus the job never terminates this is the end of my log file ----------------------------------------------------------------------- NODE (s) Real (s) (%) Time: 158483.430 159636.000 99.3 1d20h01:23 (Mnbf/s) (MFlops) (ns/day) (hour/ns) Performance: 18.919 818.029 2.726 8.805 p13_15442: p4_error: Timeout in establishing connection to remote process: 0 p12_15407: p4_error: Timeout in establishing connection to remote process: 0 Broken pipe p11_2364: p4_error: Timeout in establishing connection to remote process: 0 p9_20588: p4_error: Timeout in establishing connection to remote process: 0 p10_2329: p4_error: Timeout in establishing connection to remote process: 0 Broken pipe Broken pipe Broken pipe Broken pipe p6_24137: p4_error: Timeout in establishing connection to remote process: 0 p7_24172: p4_error: Timeout in establishing connection to remote process: 0 Broken pipe Broken pipe I have tried installing on three different clusters, using different versions of mpich and they all do this. BUT, I do not get the error if I am running a single simulation on 8 processors, I only get this problem when I run replica exchange. Any ideas what is going on? I'm also including my submission script, perhaps I am missing something, but I'm just not seeing it #!/bin/bash # #$ -N switch_less #$ -pe mpich 8 #$ -cwd #$ -j y #$ -S /bin/bash # #$ -l h_rt=00:05:00 MPIDIR=/opt/mpich/intel/bin/ MDDIR=/soft/linux/pkg/gromacs-3.3.1/bin SYSTEM=free INDEX=0 for T in 80 82 84 86 87 88 89 90 do sed "s/TTTT/$T/g" MDRUN > mdrun.$INDEX.mdp $MDDIR/grompp \ -f mdrun.$INDEX \ -c $SYSTEM.gro \ -p $SYSTEM.top \ -po mdout.$INDEX \ -o $SYSTEM$INDEX.tpr let "INDEX += 1" done if test $NSLOTS -eq $INDEX then $MPIDIR/mpirun -v -np $NSLOTS -machinefile $TMPDIR/machines \ -nolocal $MDDIR/mdrun-mpi -v \ -np $NSLOTS \ -multi $NSLOTS \ -replex 50 \ -s $SYSTEM.tpr \ -o $SYSTEM \ -c $SYSTEM.out \ -g $SYSTEM \ -e $SYSTEM \ -x $SYSTEM else echo 'wrong number of nodes for the number of replicas' fi I have tried using the -debug option when running gromacs, but I can't tell what is going on with it. Is there something I should look for in the debug logfile? thanks -Paul
_______________________________________________ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php