Neil --

Many thanks for the detailed report!

Our Memory Hooks Guy(tm) (Brian Barrett) is inprocessing at his summer
internship over the next 24-48 hours; this may delay the reply a little
bit.


> -----Original Message-----
> From: devel-boun...@open-mpi.org 
> [mailto:devel-boun...@open-mpi.org] On Behalf Of Neil Ludban
> Sent: Monday, May 22, 2006 6:36 PM
> To: de...@open-mpi.org
> Subject: [OMPI devel] memory_malloc_hooks.c and dlclose()
> 
> Hello,
> 
> I'm getting a core dump when using openmpi-1.0.2 with the MPI 
> extensions
> we're developing for the MATLAB interpreter.  This same build 
> of openmpi
> is working great with C programs and our extensions for gnu 
> octave.  The
> machine is AMD64 running Linux:
> 
> Linux kodos 2.6.9-5.ELsmp #1 SMP Wed Jan 5 19:29:47 EST 2005 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> I believe there's a bug in that opal_memory_malloc_hooks_init() links
> itself into the __free_hook chain during initialization, but then it
> never unlinks itself at shutdown.  In the interpreter environment,
> libopal.so is dlclose()d and unmapped from memory long before the
> interpreter is done with dynamic memory.  A quick check of the nightly
> trunk snapshot reveals some function name changes, but no new shutdown
> code.
> 
> After running this trivial MPI program on a single processor:
>       MPI_Init();
>       MPI_Finalize();
> I'm back to the MATLAB prompt, and break into the debugger:
> 
> >>> ^C
> (gdb) info sharedlibrary
> >From                To                  Syms Read   Shared 
> Object Library
> ...
> 0x0000002aa0b50740  0x0000002aa0b50a28  Yes         
> .../mexMPI_Init.mexa64
> 0x0000002aa0c52a50  0x0000002aa0c54318  Yes         
> .../lib/libbcmpi.so.0
> 0x0000002aa0dcef90  0x0000002aa0e37398  Yes         
> /usr/lib64/libstdc++.so.6
> 0x0000002aa0fa9ec0  0x0000002aa102e118  Yes         
> .../lib/libmpi.so.0
> 0x0000002aa1178560  0x0000002aa11af708  Yes         
> .../lib/liborte.so.0
> 0x0000002aa12cffb0  0x0000002aa12f2988  Yes         
> .../lib/libopal.so.0
> 0x0000002aa1424180  0x0000002aa14249d8  Yes         
> /lib64/libutil.so.1
> 0x0000002aa152a760  0x0000002aa1536368  Yes         /lib64/libnsl.so.1
> 0x0000002aa3540b80  0x0000002aa3551077  Yes         
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libvapi.so
> 0x0000002aa365e0a0  0x0000002aa3664a86  Yes         
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libmosal.so
> 0x0000002aa470db50  0x0000002aa4719438  Yes         
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/librhhul.so
> 0x0000002ac4e508c0  0x0000002ac4e50ed8  Yes         
> .../mexMPI_Constants.mexa64
> 0x0000002ac4f52740  0x0000002ac4f52a28  Yes         
> .../mexMPI_Finalize.mexa64
> 
> (gdb) c
> >> exit
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 182992729024 (LWP 21848)]
> opal_mem_free_free_hook (ptr=0x7fbfff96d0, caller=0xa8d4f8) 
> at memory_malloc_hooks.c:65
> 
> (gdb) info sharedlibrary
> >From                To                  Syms Read   Shared 
> Object Library
> ...
> 0x0000002aa1424180  0x0000002aa14249d8  Yes         
> /lib64/libutil.so.1
> 0x0000002aa152a760  0x0000002aa1536368  Yes         /lib64/libnsl.so.1
> 0x0000002aa3540b80  0x0000002aa3551077  Yes         
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libvapi.so
> 0x0000002aa365e0a0  0x0000002aa3664a86  Yes         
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/libmosal.so
> 0x0000002aa470db50  0x0000002aa4719438  Yes         
> /usr/local/ibgd-1.8.0/driver/infinihost/lib64/librhhul.so
> 
> (gdb) list
> 63      static void
> 64      opal_mem_free_free_hook (void *ptr, const void *caller)
> 65      {
> 66          /* dispatch about the pending free */
> 67          opal_mem_free_release_hook(ptr, malloc_usable_size(ptr));
> 68
> 69          __free_hook = old_free_hook;
> 70
> 71          /* call the next chain down */
> 72          free(ptr);
> 73
> 74          /* save the hooks again and restore our hook again */
> 
> (gdb) print ptr
> $2 = (void *) 0x7fbfff96d0
> (gdb) print caller
> $3 = (const void *) 0xa8d4f8
> (gdb) print __free_hook
> $4 = (void (*)(void *, const void *)) 0x2aa12f1d79 
> <opal_mem_free_free_hook>
> (gdb) print old_free_hook
> Cannot access memory at address 0x2aa1422800
> 
> 
> Before I start blindly hacking a workaround, can somebody 
> who's familiar
> with the openmpi internals verify that this is a real bug, suggest a
> correct fix, and/or comment on other potential problems with 
> running in
> an interpreter.
> 
> Thanks-
> 
> -Neil
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

Reply via email to