Ah yes, 18f23724a broke things so we had to fix the fix. Didn't apply it to the 
v2.x branch. Will open a PR to bring it over.

-Nathan

On Oct 17, 2018, at 11:28 AM, Eric Chamberland 
<eric.chamberl...@giref.ulaval.ca> wrote:

Hi,

since commit 18f23724a, our nightly base test is broken on v2.x branch.

Strangely, on branch v3.x, it broke the same day with 2fd9510b4b44, but was repaired some days after (can't tell exactly, but at most it was fixed with fa3d92981a).

I get segmentation faults or deadlocks in many cases.

Could this be related with issue 5842 ?
(https://github.com/open-mpi/ompi/issues/5842)

Here is an example of backtrace for a deadlock:

#4 <signal handler called>
#5 0x00007f9dc9151d17 in sched_yield () from /lib64/libc.so.6
#6 0x00007f9dc8888cee in opal_progress () at runtime/opal_progress.c:243
#7 0x00007f9dbe53cf78 in ompi_request_wait_completion (req=0x46ea000) at ../../../../ompi/request/request.h:392 #8 0x00007f9dbe53e162 in mca_pml_ob1_recv (addr=0x7f9dd64a6b30 <assertionValeursIdentiquesSurTousLesProcessus(ompi_communicator_t*, long, long, PAType<long>*, std::__debug::vector<ompi_request_t*, std::allocator<ompi_request_t*> >&)::slValeurs>, count=3, datatype=0x7f9dca61e2c0 <ompi_mpi_long>, src=1, tag=32767, comm=0x7f9dca62a840 <ompi_mpi_comm_world>, status=0x7ffcf4f08170) at pml_ob1_irecv.c:129 #9 0x00007f9dca35f3c4 in PMPI_Recv (buf=0x7f9dd64a6b30 <assertionValeursIdentiquesSurTousLesProcessus(ompi_communicator_t*, long, long, PAType<long>*, std::__debug::vector<ompi_request_t*, std::allocator<ompi_request_t*> >&)::slValeurs>, count=3, type=0x7f9dca61e2c0 <ompi_mpi_long>, source=1, tag=32767, comm=0x7f9dca62a840 <ompi_mpi_comm_world>, status=0x7ffcf4f08170) at precv.c:77 #10 0x00007f9dd6261d06 in assertionValeursIdentiquesSurTousLesProcessus (pComm=0x7f9dca62a840 <ompi_mpi_comm_world>, pRang=0, pNbProcessus=2, pValeurs=0x7f9dd5a94da0 <void girefSynchroniseGroupeProcessusModeDebugImpl<PAGroupeProcessus>(PAGroupeProcessus const&, char const*, int)::slDonnees>, pRequetes=std::__debug::vector of length 1, capacity 1 = {...}) at /pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/src/commun/Parallele/mpi_giref.cc:332

And some informations about configuration:

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2018.10.17.02h16m02s_config.log

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2018.10.17.02h16m02s_ompi_info_all.txt

Thanks,

Eric
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to