This is the a knowing issue, https://svn.open-mpi.org/trac/ompi/ticket/2087 Maybe it's priority should be raised up. Lenny.
On Wed, Dec 30, 2009 at 12:13 PM, Daniel SpÄngberg <dani...@mkem.uu.se>wrote: > Dear OpenMPI list, > > I have used the dynamic rules for collectives to be able to select one > specific algorithm. With the latest versions of openmpi this seems to be > broken. Just enabling coll_tuned_use_dynamic_rules causes the code to > segfault. However, I do not provide a file with rules, since I just want to > modify the behavior of one routine. > > I have tried the below example code on openmpi 1.3.2, 1.3.3, 1.3.4, and > 1.4. It *works* on 1.3.2, 1.3.3, but segfaults on 1.3.4 and 1.4. I have > confirmed this on Scientific Linux 5.2, and 5.4. I have also successfully > reproduced the crash using version 1.4 running on debian etch. All running > on amd64, compiled from source without other options to configure than > --prefix. The crash occurs whether I use the intel 11.1 compiler (via env > CC) or gcc. It also occurs no matter the btl is set to openib,self tcp,self > sm,self or combinations of those. See below for ompi_info and other info. I > have tried MPI_Alltoall, MPI_Alltoallv, and MPI_Allreduce which behave the > same. > > #include <stdlib.h> > #include <mpi.h> > > > int main(int argc, char **argv) > { > int rank,size; > char *buffer, *buffer2; > > MPI_Init(&argc,&argv); > > MPI_Comm_size(MPI_COMM_WORLD,&size); > MPI_Comm_rank(MPI_COMM_WORLD,&rank); > > buffer=calloc(100*size,1); > buffer2=calloc(100*size,1); > > MPI_Alltoall(buffer,100,MPI_BYTE,buffer2,100,MPI_BYTE,MPI_COMM_WORLD); > > MPI_Finalize(); > return 0; > } > > Demonstrated behaviour: > > $ ompi_info > Package: Open MPI daniels@arthur Distribution > Open MPI: 1.4 > Open MPI SVN revision: r22285 > Open MPI release date: Dec 08, 2009 > Open RTE: 1.4 > Open RTE SVN revision: r22285 > Open RTE release date: Dec 08, 2009 > OPAL: 1.4 > OPAL SVN revision: r22285 > OPAL release date: Dec 08, 2009 > Ident string: 1.4 > Prefix: > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install > Configured architecture: x86_64-unknown-linux-gnu > Configure host: arthur > Configured by: daniels > Configured on: Tue Dec 29 16:54:37 CET 2009 > Configure host: arthur > Built by: daniels > Built on: Tue Dec 29 17:04:36 CET 2009 > Built host: arthur > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: small > C compiler: gcc > C compiler absolute: /usr/bin/gcc > C++ compiler: g++ > C++ compiler absolute: /usr/bin/g++ > Fortran77 compiler: gfortran > Fortran77 compiler abs: /usr/bin/gfortran > Fortran90 compiler: gfortran > Fortran90 compiler abs: /usr/bin/gfortran > C profiling: yes > C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: no > Thread support: posix (mpi: no, progress: no) > Sparse Groups: no > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > Heterogeneous support: no > mpirun default --prefix: no > MPI I/O support: yes > MPI_WTIME support: gettimeofday > Symbol visibility support: yes > FT Checkpoint support: no (checkpoint thread: no) > MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.4) > MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4) > MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4) > > MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.4) > MCA carto: file (MCA v2.0, API v2.0, Component v1.4) > MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4) > MCA timer: linux (MCA v2.0, API v2.0, Component v1.4) > MCA installdirs: env (MCA v2.0, API v2.0, Component v1.4) > MCA installdirs: config (MCA v2.0, API v2.0, Component v1.4) > MCA dpm: orte (MCA v2.0, API v2.0, Component v1.4) > MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.4) > MCA allocator: basic (MCA v2.0, API v2.0, Component v1.4) > MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.4) > MCA coll: basic (MCA v2.0, API v2.0, Component v1.4) > MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.4) > MCA coll: inter (MCA v2.0, API v2.0, Component v1.4) > MCA coll: self (MCA v2.0, API v2.0, Component v1.4) > MCA coll: sm (MCA v2.0, API v2.0, Component v1.4) > MCA coll: sync (MCA v2.0, API v2.0, Component v1.4) > MCA coll: tuned (MCA v2.0, API v2.0, Component v1.4) > MCA io: romio (MCA v2.0, API v2.0, Component v1.4) > MCA mpool: fake (MCA v2.0, API v2.0, Component v1.4) > MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.4) > MCA mpool: sm (MCA v2.0, API v2.0, Component v1.4) > MCA pml: cm (MCA v2.0, API v2.0, Component v1.4) > MCA pml: csum (MCA v2.0, API v2.0, Component v1.4) > MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.4) > MCA pml: v (MCA v2.0, API v2.0, Component v1.4) > MCA bml: r2 (MCA v2.0, API v2.0, Component v1.4) > MCA rcache: vma (MCA v2.0, API v2.0, Component v1.4) > MCA btl: self (MCA v2.0, API v2.0, Component v1.4) > MCA btl: sm (MCA v2.0, API v2.0, Component v1.4) > MCA btl: tcp (MCA v2.0, API v2.0, Component v1.4) > MCA topo: unity (MCA v2.0, API v2.0, Component v1.4) > MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.4) > MCA osc: rdma (MCA v2.0, API v2.0, Component v1.4) > MCA iof: hnp (MCA v2.0, API v2.0, Component v1.4) > MCA iof: orted (MCA v2.0, API v2.0, Component v1.4) > MCA iof: tool (MCA v2.0, API v2.0, Component v1.4) > MCA oob: tcp (MCA v2.0, API v2.0, Component v1.4) > MCA odls: default (MCA v2.0, API v2.0, Component v1.4) > MCA ras: slurm (MCA v2.0, API v2.0, Component v1.4) > MCA rmaps: load_balance (MCA v2.0, API v2.0, Component v1.4) > MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.4) > MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.4) > MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.4) > MCA rml: oob (MCA v2.0, API v2.0, Component v1.4) > MCA routed: binomial (MCA v2.0, API v2.0, Component v1.4) > MCA routed: direct (MCA v2.0, API v2.0, Component v1.4) > MCA routed: linear (MCA v2.0, API v2.0, Component v1.4) > MCA plm: rsh (MCA v2.0, API v2.0, Component v1.4) > MCA plm: slurm (MCA v2.0, API v2.0, Component v1.4) > MCA filem: rsh (MCA v2.0, API v2.0, Component v1.4) > MCA errmgr: default (MCA v2.0, API v2.0, Component v1.4) > MCA ess: env (MCA v2.0, API v2.0, Component v1.4) > MCA ess: hnp (MCA v2.0, API v2.0, Component v1.4) > MCA ess: singleton (MCA v2.0, API v2.0, Component v1.4) > MCA ess: slurm (MCA v2.0, API v2.0, Component v1.4) > MCA ess: tool (MCA v2.0, API v2.0, Component v1.4) > MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.4) > MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.4) > > $ mpicc -O2 -o bug_openmpi_1.4_test bug_openmpi_1.4_test.c > $ ldd ./bug_openmpi_1.4_test > libmpi.so.0 => > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0 > (0x00002b33fa57e000) > libopen-rte.so.0 => > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libopen-rte.so.0 > (0x00002b33fa821000) > libopen-pal.so.0 => > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libopen-pal.so.0 > (0x00002b33faa6b000) > libdl.so.2 => /lib64/libdl.so.2 (0x00000032c7400000) > libnsl.so.1 => /lib64/libnsl.so.1 (0x00000032cfe00000) > libutil.so.1 => /lib64/libutil.so.1 (0x00000032d4a00000) > libm.so.6 => /lib64/libm.so.6 (0x00000032c7000000) > libpthread.so.0 => /lib64/libpthread.so.0 (0x00000032c7800000) > libc.so.6 => /lib64/libc.so.6 (0x00000032c6c00000) > /lib64/ld-linux-x86-64.so.2 (0x00000032c5c00000) > $ mpirun -mca btl tcp,self -mca coll_tuned_use_dynamic_rules 0 -np 8 > ./bug_openmpi_1.4_test > $ mpirun -mca btl tcp,self -mca coll_tuned_use_dynamic_rules 1 -np 8 > ./bug_openmpi_1.4_test > [girasole:27510] *** Process received signal *** > [girasole:27510] Signal: Segmentation fault (11) > [girasole:27510] Signal code: (128) > [girasole:27510] Failing at address: (nil) > [girasole:27503] *** Process received signal *** > [girasole:27503] Signal: Segmentation fault (11) > [girasole:27503] Signal code: (128) > [girasole:27503] Failing at address: (nil) > [girasole:27510] [ 0] /lib64/libpthread.so.0 [0x32c780de80] > [girasole:27510] [ 1] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2ae2b29fbeb5] > [girasole:27510] [ 2] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2ae2b29fa8ca] > [girasole:27510] [ 3] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f) > [0x2ae2ae76bbff] > [girasole:27510] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7] > [girasole:27510] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x32c6c1d8b4] > [girasole:27510] [ 6] ./bug_openmpi_1.4_test [0x400869] > [girasole:27510] *** End of error message *** > [girasole:27503] [ 0] /lib64/libpthread.so.0 [0x32c780de80] > [girasole:27503] [ 1] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b534b1b6eb5] > [girasole:27503] [ 2] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b534b1b58ca] > [girasole:27503] [ 3] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f) > [0x2b5346f26bff] > [girasole:27503] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7] > [girasole:27503] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x32c6c1d8b4] > [girasole:27503] [ 6] ./bug_openmpi_1.4_test [0x400869] > [girasole:27503] *** End of error message *** > [girasole:27505] *** Process received signal *** > [girasole:27505] Signal: Segmentation fault (11) > [girasole:27505] Signal code: (128) > [girasole:27505] Failing at address: (nil) > [girasole:27509] *** Process received signal *** > [girasole:27509] Signal: Segmentation fault (11) > [girasole:27509] Signal code: (128) > [girasole:27509] Failing at address: (nil) > [girasole:27505] [ 0] /lib64/libpthread.so.0 [0x32c780de80] > [girasole:27505] [ 1] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2ab662aa0eb5] > [girasole:27505] [ 2] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2ab662a9f8ca] > [girasole:27505] [ 3] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f) > [0x2ab65e810bff] > [girasole:27505] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7] > [girasole:27505] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x32c6c1d8b4] > [girasole:27505] [ 6] ./bug_openmpi_1.4_test [0x400869] > [girasole:27505] *** End of error message *** > [girasole:27507] *** Process received signal *** > [girasole:27507] Signal: Segmentation fault (11) > [girasole:27507] Signal code: (128) > [girasole:27507] Failing at address: (nil) > [girasole:27509] [ 0] /lib64/libpthread.so.0 [0x32c780de80] > [girasole:27509] [ 1] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b7dc1863eb5] > [girasole:27509] [ 2] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b7dc18628ca] > [girasole:27509] [ 3] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f) > [0x2b7dbd5d3bff] > [girasole:27509] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7] > [girasole:27509] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x32c6c1d8b4] > [girasole:27509] [ 6] ./bug_openmpi_1.4_test [0x400869] > [girasole:27509] *** End of error message *** > [girasole:27507] [ 0] /lib64/libpthread.so.0 [0x32c780de80] > [girasole:27507] [ 1] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b09eb873eb5] > [girasole:27507] [ 2] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b09eb8728ca] > [girasole:27507] [ 3] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f) > [0x2b09e75e3bff] > [girasole:27507] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7] > [girasole:27507] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x32c6c1d8b4] > [girasole:27507] [ 6] ./bug_openmpi_1.4_test [0x400869] > [girasole:27507] *** End of error message *** > [girasole:27504] *** Process received signal *** > [girasole:27504] Signal: Segmentation fault (11) > [girasole:27504] Signal code: (128) > [girasole:27504] Failing at address: (nil) > [girasole:27506] *** Process received signal *** > [girasole:27506] Signal: Segmentation fault (11) > [girasole:27506] Signal code: (128) > [girasole:27506] Failing at address: (nil) > [girasole:27504] [ 0] /lib64/libpthread.so.0 [0x32c780de80] > [girasole:27504] [ 1] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b6fde1afeb5] > [girasole:27504] [ 2] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b6fde1ae8ca] > [girasole:27504] [ 3] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f) > [0x2b6fd9f1fbff] > [girasole:27504] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7] > [girasole:27504] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x32c6c1d8b4] > [girasole:27504] [ 6] ./bug_openmpi_1.4_test [0x400869] > [girasole:27504] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 7 with PID 27510 on node girasole exited > on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > [girasole:27506] [ 0] /lib64/libpthread.so.0 [0x32c780de80] > [girasole:27506] [ 1] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b66f2908eb5] > [girasole:27506] [ 2] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b66f29078ca] > [girasole:27506] [ 3] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f) > [0x2b66ee678bff] > [girasole:27506] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7] > [girasole:27506] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x32c6c1d8b4] > [girasole:27506] [ 6] ./bug_openmpi_1.4_test [0x400869] > [girasole:27506] *** End of error message *** > [girasole:27508] *** Process received signal *** > [girasole:27508] Signal: Segmentation fault (11) > [girasole:27508] Signal code: (128) > [girasole:27508] Failing at address: (nil) > [girasole:27508] [ 0] /lib64/libpthread.so.0 [0x32c780de80] > [girasole:27508] [ 1] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b89b09a1eb5] > [girasole:27508] [ 2] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/openmpi/mca_coll_tuned.so > [0x2b89b09a08ca] > [girasole:27508] [ 3] > /home/daniels/src/MISC/openmpi-1.4/openmpi-1.4_install/lib/libmpi.so.0(MPI_Alltoall+0x15f) > [0x2b89ac711bff] > [girasole:27508] [ 4] ./bug_openmpi_1.4_test(main+0x97) [0x4009b7] > [girasole:27508] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) > [0x32c6c1d8b4] > [girasole:27508] [ 6] ./bug_openmpi_1.4_test [0x400869] > [girasole:27508] *** End of error message *** > > > Best regards, > > -- > Daniel SpÄngberg > Materialkemi > Uppsala Universitet > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >