While looking into another problem I ran into an issue which made ob1 segfault 
on me. Using gm, and running the test test_dan1 in the onesided test suite, 
if I limit the gm freelist by too much, I get a segfault. That is,

mpirun -np 2 -mca btl gm,self -mca btl_gm_free_list_max 1024 test_dan1

works fine, but

mpirun -np 2 -mca btl gm,self -mca btl_gm_free_list_max 512 test_dan1

segfaults. Here is the relevant output from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1077541088 (LWP 15600)]
0x404d81c1 in mca_pml_ob1_send_fin (proc=0x9bd9490, bml_btl=0xd323580, 
    hdr_des=0x9e54e78, order=255 '�', status=1) at pml_ob1.c:267
267         MCA_PML_OB1_DES_ALLOC(bml_btl, fin, order, 
sizeof(mca_pml_ob1_fin_hdr_t));
(gdb) bt
#0  0x404d81c1 in mca_pml_ob1_send_fin (proc=0x9bd9490, bml_btl=0xd323580, 
    hdr_des=0x9e54e78, order=255 '�', status=1) at pml_ob1.c:267
#1  0x404eef7a in mca_pml_ob1_send_request_put_frag (frag=0xa711f00)
    at pml_ob1_sendreq.c:1141
#2  0x404d986e in mca_pml_ob1_process_pending_rdma () at pml_ob1.c:387
#3  0x404eed57 in mca_pml_ob1_put_completion (btl=0x9c37e38, ep=0x9c42c78, 
    des=0xb62ad00, status=0) at pml_ob1_sendreq.c:1108
#4  0x404ff520 in mca_btl_gm_put_callback (port=0x9bec5e0, context=0xb62ad00, 
    status=GM_SUCCESS) at btl_gm.c:682
#5  0x40512c4f in gm_handle_sent_tokens (p=0x9bec5e0, e=0x406189c0)
    at ./libgm/gm_handle_sent_tokens.c:82
#6  0x40517c73 in _gm_unknown (p=0x9bec5e0, e=0x406189c0)
    at ./libgm/gm_unknown.c:222
#7  0x405180fc in gm_unknown (p=0x9bec5e0, e=0x406189c0)
    at ./libgm/gm_unknown.c:300
#8  0x40502708 in mca_btl_gm_component_progress () at btl_gm_component.c:649
#9  0x404f6fd6 in mca_bml_r2_progress () at bml_r2.c:110
#10 0x401a51d3 in opal_progress () at runtime/opal_progress.c:201
#11 0x405cf864 in opal_condition_wait (c=0x9e564b8, m=0x9e56478)
    at ../../../../opal/threads/condition.h:98
#12 0x405cf68e in ompi_osc_pt2pt_module_fence (assert=0, win=0x9e55ec8)
    at osc_pt2pt_sync.c:142
#13 0x400b6ebb in PMPI_Win_fence (assert=0, win=0x9e55ec8) at pwin_fence.c:57
#14 0x0804a2f3 in test_bandwidth1 (nbufsize=1050000, min_iterations=10, 
    max_iterations=1000, verbose=0) at test_dan1.c:282
#15 0x0804b06f in get_bandwidth (argc=0, argv=0x0) at test_dan1.c:686
#16 0x080512f5 in test_dan1 () at test_dan1.c:3555
#17 0x08051573 in main (argc=1, argv=0xbfeba9f4) at test_dan1.c:3639
(gdb) 

This is using the trunk. Any ideas?

Thanks,

Tim

Reply via email to