Hi,

I've run into a problem with the IMB-RMA exchange_get test.  At this point I 
suspect it's an issue in Open MPI or the test itself.  Could someone take a 
look?

I'm running Open MPI 1.8.5 and IMB 4.0.2.  MVAPICH2 is able to run all of 
IMB-RMA successfully.

 mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA

Eventually hangs at the end of exchange_get (after 4mb is reported) running the 
np=2 pass.  IMB runs every np power of 2 up to and including the np given on 
the command line.  So, with mpirun -np 4 above, IMB runs each of its tests with 
np=2 and then with np=4.

If I run just the exchange_get test, the same thing happens:

 mpirun -np 4 -mca pml ob1 -mca btl tcp,sm,self ./IMB-RMA exchange_get

If I run either of the above commands with -np 2, IMB-RMA successfully runs to 
completion.

I have reproduced with tcp, verbs, and PSM -- does not appear to be transport 
specific.  MVAPICH2 2.0 works.

Below are bracktraces from two of the four ranks.  The other two ranks each 
have a backtrace similar to these two.

Thanks!

Andrew

#0  0x00007fca39a4c0c7 in sched_yield () from /lib64/libc.so.6
#1  0x00007fca393ef2fb in opal_progress () at runtime/opal_progress.c:197
#2  0x00007fca33cd21f5 in opal_condition_wait (m=0x247fc70, c=0x247fcd8)
    at ../../../../opal/threads/condition.h:78
#3  ompi_osc_rdma_flush_lock (module=module@entry=0x247fb50, lock=0x2481a20,
                    target=target@entry=3) at osc_rdma_passive_target.c:530
#4  0x00007fca33cd43bd in ompi_osc_rdma_flush (target=3, win=0x2482150)
    at osc_rdma_passive_target.c:578
#5  0x00007fca39fe5654 in PMPI_Win_flush (rank=3, win=0x2482150)
        at pwin_flush.c:58
#6  0x000000000040aec5 in IMB_rma_exchange_get ()
#7  0x0000000000406a35 in IMB_warm_up ()
#8  0x00000000004023bd in main ()

#0  0x00007f1c81890bdd in poll () from /lib64/libc.so.6
#1  0x00007f1c81271c86 in poll_dispatch (base=0x1be8350, tv=0x7fff4c323480)
            at poll.c:165
#2  0x00007f1c81269aa4 in opal_libevent2021_event_base_loop (base=0x1be8350,
                    flags=2) at event.c:1633
#3  0x00007f1c812232e8 in opal_progress () at runtime/opal_progress.c:169
#4  0x00007f1c7b9641f5 in opal_condition_wait (m=0x1ccf4a0, c=0x1ccf508)
    at ../../../../opal/threads/condition.h:78
#5  ompi_osc_rdma_flush_lock (module=module@entry=0x1ccf380, lock=0x23287f0,
                    target=target@entry=0) at osc_rdma_passive_target.c:530
#6  0x00007f1c7b9663bd in ompi_osc_rdma_flush (target=0, win=0x2317d00)
    at osc_rdma_passive_target.c:578
#7  0x00007f1c81e19654 in PMPI_Win_flush (rank=0, win=0x2317d00)
        at pwin_flush.c:58
#8  0x000000000040aec5 in IMB_rma_exchange_get ()
#9  0x0000000000406a35 in IMB_warm_up ()
#10 0x00000000004023bd in main ()

Reply via email to