Jeff Squyres wrote:
FWIW: we fixed this recently in the openib BTL by ensuring that all registered memory is freed during the BTL finalize (vs. the mpool finalize).

This is a new issue because the mpool finalize was just recently expanded to un-register all of its memory as part of the NIC-restart effort (and will likely also be needed for checkpoint/restart...?).
mpool rdma finalize was empty function. I changed it to do the "finalize" job - go over all registered segments in mpool and release them one by one, Mpool use reference counter for each memory region and it prevents us from double free bug. In openib btl all memory that was registered with mpool on finalize stage will be unregistered with mpool too. So maybe in gm the memory (that was registred with mpool) released directly (not via mpool) and it cause the segfault.

Pasha



On Aug 13, 2007, at 9:11 AM, Tim Prins wrote:

Hi folks,

I have run into a problem with mca_mpool_rdma_finalize as implemented in r15557. With the t_win onesided test, running over gm, it segfaults. What appears to be happening is that some memory is registered with gm, and then gets freed by mca_mpool_rdma_finalize. But the free function that it is using is in the gm btl, and the btls are unloaded before the mpool is shut down. So
the function call segfaults.

If I change the code so we never unload the btls (and we don't free the gm
port), it works fine.

Note that the openib btl works just fine.

Forgive me if this is a known problem, I am trying to catch up from my
vacation...

Tim

---
If anyone cares, here is the callstack:
(gdb) bt
#0  0x404de825 in ?? () from /lib/libgcc_s.so.1
#1  0x4048081a in mca_mpool_rdma_finalize (mpool=0x925b690)
    at mpool_rdma_module.c:431
#2 0x400caca9 in mca_mpool_base_close () at base/ mpool_base_close.c:57 #3 0x40060094 in ompi_mpi_finalize () at runtime/ ompi_mpi_finalize.c:304
#4  0x4009a4c9 in PMPI_Finalize () at pfinalize.c:44
#5  0x08049946 in main (argc=1, argv=0xbfe16924) at t_win.c:214
(gdb)
gdb shows that at this point the gm btl is no longer loaded.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Reply via email to