Jeff Squyres wrote:
FWIW: we fixed this recently in the openib BTL by ensuring that all
registered memory is freed during the BTL finalize (vs. the mpool
finalize).
This is a new issue because the mpool finalize was just recently
expanded to un-register all of its memory as part of the NIC-restart
effort (and will likely also be needed for checkpoint/restart...?).
mpool rdma finalize was empty function. I changed it to do the
"finalize" job - go over all registered segments in mpool and release
them one by one,
Mpool use reference counter for each memory region and it prevents us
from double free bug. In openib btl all memory that was registered with
mpool on finalize stage will be unregistered with mpool too.
So maybe in gm the memory (that was registred with mpool) released
directly (not via mpool) and it cause the segfault.
Pasha
On Aug 13, 2007, at 9:11 AM, Tim Prins wrote:
Hi folks,
I have run into a problem with mca_mpool_rdma_finalize as
implemented in
r15557. With the t_win onesided test, running over gm, it
segfaults. What
appears to be happening is that some memory is registered with gm,
and then
gets freed by mca_mpool_rdma_finalize. But the free function that
it is using
is in the gm btl, and the btls are unloaded before the mpool is
shut down. So
the function call segfaults.
If I change the code so we never unload the btls (and we don't free
the gm
port), it works fine.
Note that the openib btl works just fine.
Forgive me if this is a known problem, I am trying to catch up from my
vacation...
Tim
---
If anyone cares, here is the callstack:
(gdb) bt
#0 0x404de825 in ?? () from /lib/libgcc_s.so.1
#1 0x4048081a in mca_mpool_rdma_finalize (mpool=0x925b690)
at mpool_rdma_module.c:431
#2 0x400caca9 in mca_mpool_base_close () at base/
mpool_base_close.c:57
#3 0x40060094 in ompi_mpi_finalize () at runtime/
ompi_mpi_finalize.c:304
#4 0x4009a4c9 in PMPI_Finalize () at pfinalize.c:44
#5 0x08049946 in main (argc=1, argv=0xbfe16924) at t_win.c:214
(gdb)
gdb shows that at this point the gm btl is no longer loaded.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel