Hi folks,

In our nightly runs with the trunk I have started seeing cases where we appear to be segfaulting within/below malloc. Below is a typical output.

Note that this appears to only happen on the trunk, when we use openib, and are in 32 bit mode. It seems to happen randomly at a very low frequency (59 out of about 60,000 32 bit openib runs).

This could be a problem with our machine, and has showed up since I started testing 32bit ofed 10 days ago.

Anyways, just curious if anyone had any ideas.

Thanks,

Tim

--

[odin011:04084] *** Process received signal ***
[odin011:04084] Signal: Segmentation fault (11)
[odin011:04084] Signal code: Invalid permissions (2)
[odin011:04084] Failing at address: 0xf7cbea68
[odin011:04084] [ 0] [0xffffe600]
[odin011:04084] [ 1]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/libopen-pal.so.0(malloc+0x82)
[0xf7e882d2]
[odin011:04084] [ 2]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/libopen-rte.so.0(orte_hash_table_set_proc+0xfa)
[0xf7ec57aa]
[odin011:04084] [ 3]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_peer_lookup+0x11d)
[0xf7cbcebd]
[odin011:04084] [ 4]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_oob_tcp.so(mca_oob_tcp_send_nb+0x1f)
[0xf7cbfccf]
[odin011:04084] [ 5]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_rml_oob.so(orte_rml_oob_send_buffer_nb+0x25a)
[0xf7cddfda]
[odin011:04084] [ 6]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_btl_openib.so
[0xf7c145f1]
[odin011:04084] [ 7]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_btl_openib.so
[0xf7c146e9]
[odin011:04084] [ 8]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_btl_openib.so(mca_btl_openib_endpoint_send+0x345)
[0xf7c0e155]
[odin011:04084] [ 9]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_btl_openib.so(mca_btl_openib_send+0x3e)
[0xf7c0718e]
[odin011:04084] [10]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_start_copy+0x17b)
[0xf7c3c4bb]
[odin011:04084] [11]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x27c)
[0xf7c35adc]
[odin011:04084] [12]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_gather_intra_basic_linear+0x65)
[0xf7bc72a5]
[odin011:04084] [13]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_gather_intra_dec_fixed+0x16a)
[0xf7bba2aa]
[odin011:04084] [14]
/san/homedirs/mpiteam/mtt-runs/odin/20070919-Nightly/pb_4/installs/eiso/install/lib/libmpi.so.0(MPI_Gather+0x18c)
[0xf7f62b6c]
[odin011:04084] [15] src/MPI_Gather_c(main+0x5fd) [0x804a101]
[odin011:04084] [16] /lib/tls/libc.so.6(__libc_start_main+0xd3) [0xf7d0fde3]
[odin011:04084] [17] src/MPI_Gather_c [0x8049a81]
[odin011:04084] *** End of error message ***

Reply via email to