When is version 3.3.3 due ? I did perform make distclean of gromacs when building it with different mpi versions - it didn't help.
any more ideas would be very much appreciated.. Thanks again, Hadas. On Tue, 2007-11-27 at 19:13 +0100, David van der Spoel wrote: > Hadas Leonov wrote: > > Hi everybody, > > > > I've wrote here before but there was no reply, however this problem is > > crucial since I cannot run Gromacs. > > > > There will be a fix for the GROMACS compilation issue in 3.3.3, which is > due to the version of the compiler. > > The lam error could follow from a previous compilation, however OpenMPI > should also work. > > Maybe it will work if you do make distclean before you rebuild with LAM. > > > > I have installed Gromacs 3.3.2 on Mac OSX Leopard. It did not compile > > with lam-mpi, so I installed it with open-mpi. > > The compilation error with lam-mpi was: > > > > ----- > > mpicc -I/sw/include -framework Accelerate -o grompp topio.o toppush.o > > topcat.o topshake.o convparm.o tomorse.o sorting.o splitter.o > > vsite_parm.o readir.o add_par.o topexcl.o toputil.o topdirs.o grompp.o > > compute_io.o -L/sw/lib ../mdlib/.libs/libmd_mpi.a -L/usr/X11/lib ../ > > gmxlib/.libs/libgmx_mpi.a /usr/local/lib/libfftw3f.a -lm /sw/lib/ > > libXm.dylib /usr/X11/lib/libXt.6.0.0.dylib /usr/X11/lib/libSM. > > 6.0.0.dylib /usr/X11/lib/libICE.6.3.0.dylib /usr/X11/lib/libXp. > > 6.2.0.dylib /usr/X11/lib/libXext.6.4.0.dylib /usr/X11/lib/ > > libX11.6.2.0.dylib /usr/X11/lib/libXau.6.0.0.dylib /usr/X11/lib/ > > libXdmcp.6.0.0.dylib > > Undefined symbols: > > "_lam_mpi_byte", referenced from: > > _lam_mpi_byte$non_lazy_ptr in libgmx_mpi.a(network.o) > > "_lam_mpi_float", referenced from: > > _lam_mpi_float$non_lazy_ptr in libgmx_mpi.a(network.o) > > "_lam_mpi_comm_world", referenced from: > > _lam_mpi_comm_world$non_lazy_ptr in libgmx_mpi.a(network.o) > > ld: symbol(s) not found > > collect2: ld returned 1 exit status > > make[3]: *** [grompp] Error 1 > > make[2]: *** [all-recursive] Error 1 > > make[1]: *** [all] Error 2 > > make: *** [all-recursive] Error 1 > > --- > > > > After installing with openmpi - I ran some benchmarks for 4 processors > > on Mac-Pro: > > d.villin: > > Leopard performance: 13714 ps/day > > old OS performance: 41143 ps/day. > > gmx-benchmark : 48000 ps/day. > > > > d.poly-ch2 > > Leopard performance: 8640 ps/day > > old OS performance: 18000 ps/day > > gmx-benchmark: 20571 ps/day > > > > old OS refers to OSX 10.4.9. > > The slow speed also happens when running only on one CPU. d.villin took > > 6 times slower than usual. So it can't just be open-mpi fault, can it? > > > > Can it be due to compiling gromacs while disabling ia32 optimization? > > > > As for crashes: I ran a position restraints of 0.5ns which usually > > takes 2 hours on 2 CPUs. The prediction of the finish time was 6 > > hours, but it crashed after 40 minutes with the following errors: > > > > -- > > step 23070, will finish at Tue Nov 20 23:18:28 2007 > > [tmdec2:69924] *** Process received signal *** > > [tmdec2:69924] Signal: Segmentation fault (11) > > [tmdec2:69924] Signal code: Address not mapped (1) > > [tmdec2:69924] Failing at address: 0x49c78d52 > > [tmdec2:69925] *** Process received signal *** > > [tmdec2:69925] Signal: Segmentation fault (11) > > [tmdec2:69925] Signal code: Address not mapped (1) > > [tmdec2:69925] Failing at address: 0x49aeac55 > > [tmdec2:69926] *** Process received signal *** > > [tmdec2:69926] Signal: Segmentation fault (11) > > [tmdec2:69926] Signal code: Address not mapped (1) > > [tmdec2:69926] Failing at address: 0x48c74d8c > > [tmdec2:69927] *** Process received signal *** > > [tmdec2:69927] Signal: Segmentation fault (11) > > [tmdec2:69927] Signal code: Address not mapped (1) > > [tmdec2:69927] Failing at address: 0x49e5e700 > > [ 1] [0xbfffd678, 0x49aeac55] (-P-) > > [ 2] (ompi_ddt_copy_content_same_ddt + 0x7d) [0xbfffd6e8, 0x006f562d] > > [ 3] (ompi_ddt_sndrcv + 0x3bf) [0xbfffd748, 0x006fbebf] > > [ 4] [ 1] [0xbfffd678, 0x48c74d8c] (-P-) > > [ 2] [ 1] [0xbfffd678, 0x49c78d52] (-P-) > > [ 2] (mca_coll_basic_alltoallv_intra + 0x28b) [0xbfffd7c8, 0x00a3a65b] > > [ 5] (MPI_Alltoallv + 0x20a) [0xbfffd858, 0x0070056a] > > [ 6] (ompi_ddt_copy_content_same_ddt + 0x7d) [0xbfffd6e8, 0x006f562d] > > [ 3] (ompi_ddt_sndrcv + 0x3bf) [0xbfffd748, 0x006fbebf] > > [ 4] [ 1] [0xbfffd678, 0x49e5e700] (-P-) > > [ 2] (ompi_ddt_copy_content_same_ddt + 0x7d) [0xbfffd6e8, 0x006f562d] > > [ 3] (ompi_ddt_sndrcv + 0x3bf) [0xbfffd748, 0x006fbebf] > > [ 4] (ompi_ddt_copy_content_same_ddt + 0x7d) [0xbfffd6e8, 0x006f562d] > > [ 3] (ompi_ddt_sndrcv + 0x3bf) [0xbfffd748, 0x006fbebf] > > [ 4] (mca_coll_basic_alltoallv_intra + 0x28b) [0xbfffd7c8, 0x00a3a65b] > > [ 5] (MPI_Alltoallv + 0x20a) [0xbfffd858, 0x0070056a] > > [ 6] (mca_coll_basic_alltoallv_intra + 0x28b) [0xbfffd7c8, 0x00a3a65b] > > [ 5] (MPI_Alltoallv + 0x20a) [0xbfffd858, 0x0070056a] > > [ 6] (pmeredist + 0x4e2) [0xbfffd8d8, 0x0004836e] > > [ 7] (do_pme + 0x494) [0xbfffda38, 0x0004d62b] > > [ 8] (force + 0x7d9) (mca_coll_basic_alltoallv_intra + 0x28b) > > [0xbfffd7c8, 0x00a3a65b] > > [ 5] (MPI_Alltoallv + 0x20a) [0xbfffd858, 0x0070056a] > > [ 6] [0xbfffdc88, 0x0002ee56] > > [ 9] (do_force + 0x87a) [0xbfffdd78, 0x0005d652] > > [10] (pmeredist + 0x4e2) [0xbfffd8d8, 0x0004836e] > > [ 7] (do_pme + 0x494) [0xbfffda38, 0x0004d62b] > > [ 8] (pmeredist + 0x4e2) [0xbfffd8d8, 0x0004836e] > > [ 7] (do_md + 0x164f) [0xbfffe988, 0x0001666e] > > [11] (mdrunner + 0xb04) [0xbfffeb08, 0x00014abe] > > [12] (force + 0x7d9) [0xbfffdc88, 0x0002ee56] > > [ 9] (do_force + 0x87a) [0xbfffdd78, 0x0005d652] > > [10] (pmeredist + 0x4e2) [0xbfffd8d8, 0x0004836e] > > [ 7] (do_pme + 0x494) [0xbfffda38, 0x0004d62b] > > [ 8] (force + 0x7d9) (do_pme + 0x494) [0xbfffda38, 0x0004d62b] > > [ 8] (force + 0x7d9) [0xbfffdc88, 0x0002ee56] > > [ 9] (do_force + 0x87a) (main + 0x463) [0xbfffeb98, 0x00018c69] > > [13] (start + 0x36) [0xbfffebbc, 0x0000216e] > > [14] [0x00000000, 0x0000000e] (FP-) > > [tmdec2:69925] *** End of error message *** > > (do_md + 0x164f) [0xbfffe988, 0x0001666e] > > [11] (mdrunner + 0xb04) [0xbfffeb08, 0x00014abe] > > [12] [0xbfffdc88, 0x0002ee56] > > [ 9] (do_force + 0x87a) [0xbfffdd78, 0x0005d652] > > [10] (do_md + 0x164f) [0xbfffe988, 0x0001666e] > > [11] [0xbfffdd78, 0x0005d652] > > [10] (do_md + 0x164f) [0xbfffe988, 0x0001666e] > > [11] (mdrunner + 0xb04) [0xbfffeb08, 0x00014abe] > > [12] (main + 0x463) [0xbfffeb98, 0x00018c69] > > [13] (start + 0x36) [0xbfffebbc, 0x0000216e] > > [14] [0x00000000, 0x0000000e] (FP-) > > [tmdec2:69926] *** End of error message *** > > (mdrunner + 0xb04) [0xbfffeb08, 0x00014abe] > > [12] (main + 0x463) [0xbfffeb98, 0x00018c69] > > [13] (main + 0x463) [0xbfffeb98, 0x00018c69] > > [13] (start + 0x36) [0xbfffebbc, 0x0000216e] > > [14] [0x00000000, 0x0000000e] (FP-) > > [tmdec2:69924] *** End of error message *** > > (start + 0x36) [0xbfffebbc, 0x0000216e] > > [14] [0x00000000, 0x0000000e] (FP-) > > [tmdec2:69927] *** End of error message *** > > [tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file / > > SourceCache/openmpi/openmpi-5/openmpi/orte/mca/pls/base/ > > pls_base_orted_cmds.c at line 275 > > [tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file / > > SourceCache/openmpi/openmpi-5/openmpi/orte/mca/pls/rsh/ > > pls_rsh_module.c at line 1164 > > [tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file / > > SourceCache/openmpi/openmpi-5/openmpi/orte/mca/errmgr/hnp/errmgr_hnp.c > > at line 90 > > mpirun noticed that job rank 1 with PID 69925 on node > > tmdec2.ls.huji.ac.il exited on signal 11 (Segmentation fault). > > [tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file / > > SourceCache/openmpi/openmpi-5/openmpi/orte/mca/pls/base/ > > pls_base_orted_cmds.c at line 188 > > [tmdec2.ls.huji.ac.il:69921] [0,0,0] ORTE_ERROR_LOG: Timeout in file / > > SourceCache/openmpi/openmpi-5/openmpi/orte/mca/pls/rsh/ > > pls_rsh_module.c at line 1196 > > -------------------------------------------------------------------------- > > mpirun was unable to cleanly terminate the daemons for this job. > > Returned value Timeout instead of ORTE_SUCCESS. > > > > -------------------------------------------------------------------------- > > 1 additional process aborted (not shown) > > --- > > > > So it looks like the problem is with open-mpi, but if I can't compile > > with lam, there's not a way of knowing. > > > > Help? any ideas? > > > > Thanks in advance, > > Hadas Leonov. > > > > _______________________________________________ > > gmx-users mailing list [email protected] > > http://www.gromacs.org/mailman/listinfo/gmx-users > > Please search the archive at http://www.gromacs.org/search before posting! > > Please don't post (un)subscribe requests to the list. Use the > > www interface or send it to [EMAIL PROTECTED] > > Can't post? Read http://www.gromacs.org/mailing_lists/users.php > > _______________________________________________ gmx-users mailing list [email protected] http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php

