Neil -- Many thanks for the detailed report!
Our Memory Hooks Guy(tm) (Brian Barrett) is inprocessing at his summer internship over the next 24-48 hours; this may delay the reply a little bit. > -----Original Message----- > From: devel-boun...@open-mpi.org > [mailto:devel-boun...@open-mpi.org] On Behalf Of Neil Ludban > Sent: Monday, May 22, 2006 6:36 PM > To: de...@open-mpi.org > Subject: [OMPI devel] memory_malloc_hooks.c and dlclose() > > Hello, > > I'm getting a core dump when using openmpi-1.0.2 with the MPI > extensions > we're developing for the MATLAB interpreter. This same build > of openmpi > is working great with C programs and our extensions for gnu > octave. The > machine is AMD64 running Linux: > > Linux kodos 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST 2005 > x86_64 x86_64 x86_64 GNU/Linux > > I believe there's a bug in that opal_memory_malloc_hooks_init() links > itself into the __free_hook chain during initialization, but then it > never unlinks itself at shutdown. In the interpreter environment, > libopal.so is dlclose()d and unmapped from memory long before the > interpreter is done with dynamic memory. A quick check of the nightly > trunk snapshot reveals some function name changes, but no new shutdown > code. > > After running this trivial MPI program on a single processor: > MPI_Init(); > MPI_Finalize(); > I'm back to the MATLAB prompt, and break into the debugger: > > >>> ^C > (gdb) info sharedlibrary > >From To Syms Read Shared > Object Library > ... > 0x0000002aa0b50740 0x0000002aa0b50a28 Yes > .../mexMPI_Init.mexa64 > 0x0000002aa0c52a50 0x0000002aa0c54318 Yes > .../lib/libbcmpi.so.0 > 0x0000002aa0dcef90 0x0000002aa0e37398 Yes > /usr/lib64/libstdc++.so.6 > 0x0000002aa0fa9ec0 0x0000002aa102e118 Yes > .../lib/libmpi.so.0 > 0x0000002aa1178560 0x0000002aa11af708 Yes > .../lib/liborte.so.0 > 0x0000002aa12cffb0 0x0000002aa12f2988 Yes > .../lib/libopal.so.0 > 0x0000002aa1424180 0x0000002aa14249d8 Yes > /lib64/libutil.so.1 > 0x0000002aa152a760 0x0000002aa1536368 Yes /lib64/libnsl.so.1 > 0x0000002aa3540b80 0x0000002aa3551077 Yes > /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libvapi.so > 0x0000002aa365e0a0 0x0000002aa3664a86 Yes > /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libmosal.so > 0x0000002aa470db50 0x0000002aa4719438 Yes > /usr/local/ibgd-1.8.0/driver/infinihost/lib64/librhhul.so > 0x0000002ac4e508c0 0x0000002ac4e50ed8 Yes > .../mexMPI_Constants.mexa64 > 0x0000002ac4f52740 0x0000002ac4f52a28 Yes > .../mexMPI_Finalize.mexa64 > > (gdb) c > >> exit > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 182992729024 (LWP 21848)] > opal_mem_free_free_hook (ptr=0x7fbfff96d0, caller=0xa8d4f8) > at memory_malloc_hooks.c:65 > > (gdb) info sharedlibrary > >From To Syms Read Shared > Object Library > ... > 0x0000002aa1424180 0x0000002aa14249d8 Yes > /lib64/libutil.so.1 > 0x0000002aa152a760 0x0000002aa1536368 Yes /lib64/libnsl.so.1 > 0x0000002aa3540b80 0x0000002aa3551077 Yes > /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libvapi.so > 0x0000002aa365e0a0 0x0000002aa3664a86 Yes > /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libmosal.so > 0x0000002aa470db50 0x0000002aa4719438 Yes > /usr/local/ibgd-1.8.0/driver/infinihost/lib64/librhhul.so > > (gdb) list > 63 static void > 64 opal_mem_free_free_hook (void *ptr, const void *caller) > 65 { > 66 /* dispatch about the pending free */ > 67 opal_mem_free_release_hook(ptr, malloc_usable_size(ptr)); > 68 > 69 __free_hook = old_free_hook; > 70 > 71 /* call the next chain down */ > 72 free(ptr); > 73 > 74 /* save the hooks again and restore our hook again */ > > (gdb) print ptr > $2 = (void *) 0x7fbfff96d0 > (gdb) print caller > $3 = (const void *) 0xa8d4f8 > (gdb) print __free_hook > $4 = (void (*)(void *, const void *)) 0x2aa12f1d79 > <opal_mem_free_free_hook> > (gdb) print old_free_hook > Cannot access memory at address 0x2aa1422800 > > > Before I start blindly hacking a workaround, can somebody > who's familiar > with the openmpi internals verify that this is a real bug, suggest a > correct fix, and/or comment on other potential problems with > running in > an interpreter. > > Thanks- > > -Neil > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >