-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello everybody.
I would like to chime in here also my problem might not be directly related. > The problem of Gromacs stalling on i7 when using multiple CPUs is a MPI > problem. It is most likely caused by a shared memory bug in Open MPI > that was fixed in the latest release (1.4.1). > > Switching to openmpi-1.4.1 solves the problem. We are using openmpi-1.4.1 on Nehalem CPUs. With the current gromacs 4.0.7 I see reproducible segfaults when either "numactl --cpunodebind=0 --membind=0 mpirun ..." or "mpirun --mca mpi_paffinity_alone" is used, e.g. /usr/local/x86_64.Linux/bin/mpirun -np 4 --mca mpi_paffinity_alone 1 /usr/local/stow/gromacs407/x86_64.Linux/bin/mdrun_407_mpi_d [neuro36a:01728] *** Process received signal *** [neuro36a:01728] Signal: Segmentation fault (11) [neuro36a:01728] Signal code: Address not mapped (1) [neuro36a:01728] Failing at address: 0x8 [neuro36a:01728] [ 0] [0x7ff120] [neuro36a:01728] [ 1] [0x7f15a7] [neuro36a:01728] [ 2] [0x7cadeb] [neuro36a:01728] [ 3] [0x7cacd5] [neuro36a:01728] [ 4] [0x6cc533] [neuro36a:01728] [ 5] [0x6d704e] [neuro36a:01728] [ 6] [0x4a1f9e] [neuro36a:01728] [ 7] [0x49c6cc] [neuro36a:01728] [ 8] [0x40e046] [neuro36a:01728] [ 9] [0x800749] [neuro36a:01728] [10] [0x4001b9] [neuro36a:01728] *** End of error message *** - -------------------------------------------------------------------------- mpirun noticed that process rank 1 with PID 1728 on node neuro36a exited on signal 11 (Segmentation fault). - -------------------------------------------------------------------------- mpirun -V mpirun (Open MPI) 1.4.1 Everything works as expected if no core binding is used at all. The serial version built the same way only without the --enable-mpi switch shows no problems if used with numactl. The numactl/mpirun combination although a bit unusual works fine with other codes, e.g. cpmd, vasp ..., as does the usual "mpirun --mca mpi_paffinity_alone" switch. Since we are using CPU binding to partition an eight core node into two SGE slots with 4 cores each this situation is not optimal. I will try openmpi 1.4.2 as soon as it has been released, though. Best Regards Christof Köhler - -- Dr. rer. nat. Christof Köhler email: [email protected] Universitaet Bremen/ BCCMS phone: +49-(0)421-218-2486 Am Fallturm 1/ TAB/ Raum 3.12 fax: +49-(0)421-218-4764 28359 Bremen PGP: http://www.bccms.uni-bremen.de/fileadmin/BCCMS/pgp_keys/ChristofKoehler_UniBremen.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFLdW4LRtHb9dSZpXwRAmAZAKCndmiG1VF1zFcWX6gNmkg5nFNgfwCfSyl2 bTORJsG7XkFZ8PghgSQFts0= =DRvQ -----END PGP SIGNATURE----- -- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. Can't post? Read http://www.gromacs.org/mailing_lists/users.php

