I get intermittent deadlocks wit the latest trunk. The smallest reproducer
is a shell for loop around a small (2 processes) short (20 seconds) MPI
application. After few tens of iterations the MPI_Init will deadlock with
the following backtrace:

#0  0x00007fa94b5d9aed in nanosleep () from /lib64/libc.so.6
#1  0x00007fa94b60ec94 in usleep () from /lib64/libc.so.6
#2  0x00007fa94960ba08 in OPAL_PMIX_PMIX1XX_PMIx_Fence (procs=0x0,
nprocs=0, info=0x7ffd7934fb90,
    ninfo=1) at src/client/pmix_client_fence.c:100
#3  0x00007fa9498376a2 in pmix1_fence (procs=0x0, collect_data=1) at
pmix1_client.c:305
#4  0x00007fa94bb39ba4 in ompi_mpi_init (argc=3, argv=0x7ffd793500a8,
requested=3,
    provided=0x7ffd7934ff94) at runtime/ompi_mpi_init.c:645
#5  0x00007fa94bb77281 in PMPI_Init_thread (argc=0x7ffd7934ff8c,
argv=0x7ffd7934ff80, required=3,
    provided=0x7ffd7934ff94) at pinit_thread.c:69
#6  0x000000000040150f in main (argc=3, argv=0x7ffd793500a8) at
osu_mbw_mr.c:86

On my machines this is reproducible at 100% after anywhere between 50 and
100 iterations.

  Thanks,
    George.

Reply via email to