Hi all, I'm trying to debug a segfault relating to the memory management in Gtk2Hs. Rather than make you read the ticket http://hackage.haskell.org/trac/gtk2hs/ticket/1183 , I'll describe the problem:
- compiler 6.12.1 or 6.12.3 - darcs head of Gtk2Hs with #define DEBUG instead of #undef DEBUG in gtk/Graphics/UI/Gtk/General/hsthread.c - platform Ubuntu Linux, x86-64 - to reproduce: cd gtk2hs/gtk/demo/hello and run ghci World.hs and type 'main' A window with the "Hello World" button appears. After a few seconds, the GC runs and the finaliser of the GtkButton is run since the Haskell program no longer holds a reference to that object (only the GtkWindow in C land has). Thus, the GC calls a C function gtk2hs_g_object_unref_from_mainloop which is supposed to enqueue the object into a global data structure from which objects are later taken and g_object_unref is called on them. This global data structure is protected by a mutex, which is acquired using g_static_mutex_lock: void gtk2hs_g_object_unref_from_mainloop(gpointer object) { int mutex_locked = 0; if (threads_initialised) { #ifdef DEBUG printf("acquiring lock to add a %s object at %lx\n", g_type_name(G_OBJECT_TYPE(object)), (unsigned long) object); printf("value of lock function is %lx\n", (unsigned long) g_thread_functions_for_glib_use.mutex_lock); #endif g_rand_new(); #if defined( WIN32 ) EnterCriticalSection(>k2hs_finalizer_mutex); #else g_static_mutex_lock(>k2hs_finalizer_mutex); #endif mutex_locked = 1; } [..] The program prints: acquiring lock to add a GtkButton object at 22d8020 value of lock function is 0 zsh: segmentation fault ghci World Now the debugging weirdness starts. Whatever I do, I cannot get gdb to find the symbol gtk2hs_g_object_unref_from_mainloop. Since the function above is contained in a C file that comes with our Haskell library, I tried to add "cc-options: -g" and "cc-options: - ggdb -O0", but maybe somewhere symbols are stripped. So I added the bogus function call to "g_rand_new()" which is not called anywhere else and gdb stops as follows: acquiring lock to add a GtkButton object at 2105020 value of lock function is 0 [Switching to Thread 0x7ffff41ff710 (LWP 15735)] Breakpoint 12, 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/ libglib-2.0.so This all seems reasonable, but: (gdb) bt #0 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/libglib-2.0.so #1 0x00000000419b3792 in ?? () #2 0x00007ffff678f078 in ?? () i.e. the calling context is broken. I'm very, very sure that the caller is indeed the above mentioned function and since g_rand_new isn't called anywhere in my Haskell program (and otherwise the calling context would be sane). I'm also passing the address of gtk2hs_g_object_unref_from_mainloop as FinalizerPtr to all my ForeignPtrs, so there is no inlining going on. Back to the culprit, the call to g_static_mutex_lock. This is a macro that expands to *g_thread_functions_for_glib_use.mutex_lock where g_thread_functions_for_glib is a global variable that contains a lot of function pointers. At the break point, it contains this: (gdb) print g_thread_functions_for_glib_use $33 = {mutex_new = 0x7ffff0cd9820 <g_mutex_new_posix_impl>, mutex_lock = 0x7ffff6c8b3c0 <__pthread_mutex_lock>, mutex_trylock = 0x7ffff0cd97b0 <g_mutex_trylock_posix_impl>, mutex_unlock = 0x7ffff6c8ca00 <__pthread_mutex_unlock>, mutex_free = 0x7ffff0cd9740 <g_mutex_free_posix_impl>, [..] So the call to g_mutex_lock should call the function __pthread_mutex_lock but it calls NULL. I hoped that writing this email would give me a bit more insight into the problem, but for now I suspect that something overwrites either the stack or the code of the function. On the same platform, the compiled version prints: acquiring lock to add a GtkButton object at 1b05820 value of lock function is 7f7adcabd3c0 within mutex: adding finalizer to a GtkButton object! On Mac OS or i386, using ghci or ghc, version 6.10.4, it works as well. Now for the fun bit: on i386 using ghci version 6.12.1 it works too. So it's an x86-64 and ghc 6.12.1 bug. According to Christian Maeder who submitted the ticket, the problem persists in 6.12.3. Any hints and help appreciated, Cheers, Axel ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Gtk2hs-devel mailing list Gtk2hs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gtk2hs-devel