Folks,

this email contains :
- the description of a problem
- a possible fix that requires a review


PROBLEM :
i always get SIGSEGV when running
mpirun -np 2 --mca btl scif,self ./test_4610

test_4610.c is attached to https://svn.open-mpi.org/trac/ompi/ticket/4610

in order to reproduce the issue, MPSS must be loaded
/* only MPSS is required, MIC is *not* required */


here is what happens :

ompi_mpi_finalize calls
mca_base_framework_close(&ompi_mpool_base_framework)
at ompi/runtime/ompi_mpi_finalize:411

that ends up crashing when executing

mpool_grdma->resources.deregister_mem
at ompi/mca/mpool/grdma/mpool_grdma_module.c:115

where mpool_grdma->resources.deregister_mem *was* scif_dereg_mem

i wrote *was* and not *is* because before that, ompi_mpi_finalize called

mca_base_framework_close(&ompi_bml_base_framework)
at ompi/runtime/ompi_mpi_finalize:408

which indirectly unloaded the scif btl (and hence the scif_dereg_mem
function)



POSSIBLE FIX :

a naive approach is to call
mca_base_framework_close(&ompi_mpool_base_framework)
*before*
mca_base_framework_close(&ompi_bml_base_framework)

even if i ran very few tests and did not experience any issue, i simply do
not know wether this is the right thing to do and what could be the
consequences of swapping these two calls.

could someone please review and comment this ?

Thanks in advance,

Gilles

Reply via email to