Folks, this email contains : - the description of a problem - a possible fix that requires a review
PROBLEM : i always get SIGSEGV when running mpirun -np 2 --mca btl scif,self ./test_4610 test_4610.c is attached to https://svn.open-mpi.org/trac/ompi/ticket/4610 in order to reproduce the issue, MPSS must be loaded /* only MPSS is required, MIC is *not* required */ here is what happens : ompi_mpi_finalize calls mca_base_framework_close(&ompi_mpool_base_framework) at ompi/runtime/ompi_mpi_finalize:411 that ends up crashing when executing mpool_grdma->resources.deregister_mem at ompi/mca/mpool/grdma/mpool_grdma_module.c:115 where mpool_grdma->resources.deregister_mem *was* scif_dereg_mem i wrote *was* and not *is* because before that, ompi_mpi_finalize called mca_base_framework_close(&ompi_bml_base_framework) at ompi/runtime/ompi_mpi_finalize:408 which indirectly unloaded the scif btl (and hence the scif_dereg_mem function) POSSIBLE FIX : a naive approach is to call mca_base_framework_close(&ompi_mpool_base_framework) *before* mca_base_framework_close(&ompi_bml_base_framework) even if i ran very few tests and did not experience any issue, i simply do not know wether this is the right thing to do and what could be the consequences of swapping these two calls. could someone please review and comment this ? Thanks in advance, Gilles