Hi,
since commit 18f23724a, our nightly base test is broken on v2.x branch.
Strangely, on branch v3.x, it broke the same day with 2fd9510b4b44, but
was repaired some days after (can't tell exactly, but at most it was
fixed with fa3d92981a).
I get segmentation faults or deadlocks in many cases.
Could this be related with issue 5842 ?
(https://github.com/open-mpi/ompi/issues/5842)
Here is an example of backtrace for a deadlock:
#4 <signal handler called>
#5 0x00007f9dc9151d17 in sched_yield () from /lib64/libc.so.6
#6 0x00007f9dc8888cee in opal_progress () at runtime/opal_progress.c:243
#7 0x00007f9dbe53cf78 in ompi_request_wait_completion (req=0x46ea000)
at ../../../../ompi/request/request.h:392
#8 0x00007f9dbe53e162 in mca_pml_ob1_recv (addr=0x7f9dd64a6b30
<assertionValeursIdentiquesSurTousLesProcessus(ompi_communicator_t*,
long, long, PAType<long>*, std::__debug::vector<ompi_request_t*,
std::allocator<ompi_request_t*> >&)::slValeurs>, count=3,
datatype=0x7f9dca61e2c0 <ompi_mpi_long>, src=1, tag=32767,
comm=0x7f9dca62a840 <ompi_mpi_comm_world>, status=0x7ffcf4f08170) at
pml_ob1_irecv.c:129
#9 0x00007f9dca35f3c4 in PMPI_Recv (buf=0x7f9dd64a6b30
<assertionValeursIdentiquesSurTousLesProcessus(ompi_communicator_t*,
long, long, PAType<long>*, std::__debug::vector<ompi_request_t*,
std::allocator<ompi_request_t*> >&)::slValeurs>, count=3,
type=0x7f9dca61e2c0 <ompi_mpi_long>, source=1, tag=32767,
comm=0x7f9dca62a840 <ompi_mpi_comm_world>, status=0x7ffcf4f08170) at
precv.c:77
#10 0x00007f9dd6261d06 in assertionValeursIdentiquesSurTousLesProcessus
(pComm=0x7f9dca62a840 <ompi_mpi_comm_world>, pRang=0, pNbProcessus=2,
pValeurs=0x7f9dd5a94da0 <void
girefSynchroniseGroupeProcessusModeDebugImpl<PAGroupeProcessus>(PAGroupeProcessus
const&, char const*, int)::slDonnees>, pRequetes=std::__debug::vector of
length 1, capacity 1 = {...}) at
/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/src/commun/Parallele/mpi_giref.cc:332
And some informations about configuration:
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2018.10.17.02h16m02s_config.log
http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2018.10.17.02h16m02s_ompi_info_all.txt
Thanks,
Eric
_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://lists.open-mpi.org/mailman/listinfo/devel