On Aug 13, 2007, at 4:04 PM, Gleb Natapov wrote:

mpool rdma finalize was empty function. I changed it to do the
"finalize" job - go over all registered segments in mpool and release
them one by one,
Mpool use reference counter for each memory region and it prevents us
from double free bug. In openib btl all memory that was registered with
mpool  on finalize stage will be  unregistered with mpool too.
So maybe in gm the memory (that was registred with mpool) released
directly (not via mpool) and it cause the segfault.

As far as I understand the problem Tim see is much more serious. During
finalize gm BTL is unloaded and only after that mpool finalize is
called. Mpool uses callbacks into gm BTL to register/unregister memory,
but BTL is not there already.

Right. We had the same problem in the openib btl, too. See https:// svn.open-mpi.org/trac/ompi/changeset/15735.

I don't know if this is the exact same scenario Tim is running into, but the end result is the same (openib btl was being destroyed and still leaving memory registered in the mpool).

--
Jeff Squyres
Cisco Systems

Reply via email to