So it seems that there is a problem in the shared memory communication
layer of openmpi that only shows up sporadically. However, if it is not
reproducible it could also be physical memory problems, i.e. bad DIMMS,
espcially sice you have data corruption every once in a while. Some
tests that you can do, take a big file (much larger than the amount of
memory you have) and run md5sum on it a few times. Copy the file to a
"good" machine and run it there as well. It should always give the same
result. If you can rule out hardware than OpenMPI could be the problem.
You could try the latest LAM or MPICH 2.x (not 1.x!).
Our sysadmin has run Memtest-86 v3.3 and found no problem in 4 passes.
I will look into MPICH 2.x (we found openmpi to run 10% faster than
LAM so don't really want to go back).
Thanks for the reply,
Chris.
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php