Jeff,

I posted a patch for this on the ticket.

Scott

On Aug 26, 2010, at 10:10 AM, Scott Atchley wrote:

> Hi all,
> 
> I compiled 1.4.3rc1 with MX 1.2.12 on RHEL 5.4 (2.6.18-164.el5). It does not 
> like the memory manager and MX. Compiling using --without-memory-manager 
> works fine. The output below is form the default configure (i.e. 
> --with-memory-manager).
> 
> Note, I still see unusual latencies for some tests when using the BTL such as 
> reduce-scatter, allgather, etc. I do not see them with the MTL. An example of 
> BTL latencies from reduce-scatter is:
> 
>          256         1000         7.01         7.01         7.01
>          512         1000         7.56         7.56         7.56
>         1024         1000         8.58         8.58         8.58
>         2048         1000        10.36        10.36        10.36
>         4096         1000        14.49        14.49        14.49
>         8192         1000      5180.16      5180.57      5180.36
>        16384         1000        94.96        94.97        94.96
>        32768         1000      4676.30      4676.68      4676.49
>        65536          640      4625.85      4626.23      4626.04
>       131072          320       243.43       243.46       243.45
>       262144          160       425.56       425.66       425.61
> 
> Scott
> 
> % mpirun -hostfile hosts -np 2 ./IMB-MPI1.ompi-1.4.3rc1 pingpong
> [rain16:22509] *** Process received signal ***
> [rain16:22509] Signal: Segmentation fault (11)
> [rain16:22509] Signal code: Address not mapped (1)
> [rain16:22509] Failing at address: 0x2c0
> [rain15:24145] *** Process received signal ***
> [rain15:24145] Signal: Segmentation fault (11)
> [rain15:24145] Signal code: Address not mapped (1)
> [rain15:24145] Failing at address: 0x25a0
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 22509 on node rain16 exited on 
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> gdb shows:
> 
> #0  0x0000003d084075c8 in ?? () from /lib64/libgcc_s.so.1
> (gdb) bt
> #0  0x0000003d084075c8 in ?? () from /lib64/libgcc_s.so.1
> #1  0x0000003d0840882b in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #2  0x0000003d060e5eb8 in backtrace () from /lib64/libc.so.6
> #3  0x00002af68e7a47de in opal_backtrace_buffer ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #4  0x00002af68e7a24ce in show_stackframe ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #5  <signal handler called>
> #6  0x00000000000002c0 in ?? ()
> #7  0x00002af690520640 in mca_mpool_fake_release_memory ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/openmpi/mca_mpool_fake.so
> #8  0x00002af68e2f49ce in mca_mpool_base_mem_cb ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #9  0x00002af68e78347b in opal_mem_hooks_release_hook ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #10 0x00002af68e7a791f in opal_mem_free_ptmalloc2_munmap ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #11 0x00002af68e7ac2b1 in opal_memory_ptmalloc2_free_hook ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #12 0x0000003d060727c1 in free () from /lib64/libc.so.6
> #13 0x00002af69197aaad in mx__rl_fini (rl=0xab5f928)
>    at ../../../libmyriexpress/userspace/../mx__request.c:102
> #14 0x00002af69196924d in mx_close_endpoint (endpoint=0xab5f820)
>    at ../../../libmyriexpress/userspace/../mx_close_endpoint.c:124
> #15 0x00002af69155e3dc in ompi_mtl_mx_finalize ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/openmpi/mca_mtl_mx.so
> #16 0x00002af68e2f87e0 in mca_pml_base_select ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #17 0x00002af68e2bcf40 in ompi_mpi_init ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #18 0x00002af68e2da2b1 in PMPI_Init_thread ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #19 0x0000000000403359 in main ()
> 
> 
> If I tell it to use BTLs only it changes to:
> 
> % mpirun -mca pml ob1 -hostfile hosts -np 2 ./IMB-MPI1.ompi-1.4.3rc1 pingpong
> [rain16:22552] *** Process received signal ***
> [rain15:24195] *** Process received signal ***
> [rain15:24195] Signal: Segmentation fault (11)
> [rain15:24195] Signal code: Address not mapped (1)
> [rain15:24195] Failing at address: 0x290
> [rain16:22552] Signal: Segmentation fault (11)
> [rain16:22552] Signal code: Address not mapped (1)
> [rain16:22552] Failing at address: 0x290
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 22552 on node rain16 exited on 
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> gdb shows:
> 
> #0  0x0000003d084075c8 in ?? () from /lib64/libgcc_s.so.1
> #1  0x0000003d0840882b in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #2  0x0000003d060e5eb8 in backtrace () from /lib64/libc.so.6
> #3  0x00002b8310ee17de in opal_backtrace_buffer ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #4  0x00002b8310edf4ce in show_stackframe ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #5  <signal handler called>
> #6  0x0000000000000290 in ?? ()
> #7  0x00002b8312c5d640 in mca_mpool_fake_release_memory ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/openmpi/mca_mpool_fake.so
> #8  0x00002b8310a319ce in mca_mpool_base_mem_cb ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #9  0x00002b8310ec047b in opal_mem_hooks_release_hook ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #10 0x00002b8310ee5195 in sYSTRIm ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #11 0x00002b8310ee92da in opal_memory_ptmalloc2_free_hook ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #12 0x0000003d060727c1 in free () from /lib64/libc.so.6
> #13 0x0000003d060960bd in closedir () from /lib64/libc.so.6
> #14 0x00002b8310ec7cc9 in foreachfile_callback ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #15 0x00002b8310ec797a in foreach_dirinpath ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #16 0x00002b8310ec7a1e in lt_dlforeachfile ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #17 0x00002b8310ecf2a5 in mca_base_component_find ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #18 0x00002b8310ecfc75 in mca_base_components_open ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #19 0x00002b8310a2eb46 in ompi_dpm_base_open ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #20 0x00002b83109fa3c2 in ompi_mpi_init ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #21 0x00002b8310a172b1 in PMPI_Init_thread ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #22 0x0000000000403359 in main ()
> 
> 
> Lastly, with just the MTL:
> 
> % mpirun -mca pml cm -hostfile hosts -np 2 ./IMB-MPI1.ompi-1.4.3rc1 pingpong
> [rain16:22607] *** Process received signal ***
> [rain15:24247] *** Process received signal ***
> [rain15:24247] Signal: Segmentation fault (11)
> [rain15:24247] Signal code: Address not mapped (1)
> [rain15:24247] Failing at address: 0x38e0
> [rain16:22607] Signal: Segmentation fault (11)
> [rain16:22607] Signal code: Address not mapped (1)
> [rain16:22607] Failing at address: 0x38e0
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 22607 on node rain16 exited on 
> signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> 
> gdb shows:
> 
> #0  0x0000003d084075c8 in ?? () from /lib64/libgcc_s.so.1
> #1  0x0000003d0840882b in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #2  0x0000003d060e5eb8 in backtrace () from /lib64/libc.so.6
> #3  0x00002afa78ae87de in opal_backtrace_buffer ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #4  0x00002afa78ae64ce in show_stackframe ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #5  <signal handler called>
> #6  0x00000000000038e0 in ?? ()
> #7  0x00002afa7a864640 in mca_mpool_fake_release_memory ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/openmpi/mca_mpool_fake.so
> #8  0x00002afa786389ce in mca_mpool_base_mem_cb ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #9  0x00002afa78ac747b in opal_mem_hooks_release_hook ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #10 0x00002afa78aec195 in sYSTRIm ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #11 0x00002afa78af02da in opal_memory_ptmalloc2_free_hook ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #12 0x0000003d060727c1 in free () from /lib64/libc.so.6
> #13 0x00002afa78acec45 in foreachfile_callback ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #14 0x00002afa78ace97a in foreach_dirinpath ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #15 0x00002afa78acea1e in lt_dlforeachfile ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #16 0x00002afa78ad62a5 in mca_base_component_find ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #17 0x00002afa78ad6c75 in mca_base_components_open ()
>   from 
> /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libopen-pal.so.0
> #18 0x00002afa7863ca26 in ompi_pubsub_base_open ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #19 0x00002afa78601394 in ompi_mpi_init ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #20 0x00002afa7861e2b1 in PMPI_Init_thread ()
>   from /nfs/home/atchley/projects/openmpi-1.4.3rc1/build/rain/lib/libmpi.so.0
> #21 0x0000000000403359 in main ()
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to