On 16/07/2010 14:04, Axel Simon wrote: > Hi Simon, > > On 16.07.2010, at 14:29, Simon Marlow wrote: > >> On 16/07/2010 12:36, Axel Simon wrote: >>> Dear Haskell maintainers, >>> >>> I've progressed a little and found that the problem is down to >>> accessing global variables that are declared in dynamic libraries. In >>> a nutshell, this doesn't as the addresses of these global variables >>> are all wrong when ghci is executing the code. So, I think I hit: >>> >>> http://hackage.haskell.org/trac/ghc/ticket/781 >>> >>> I was able to work around this problem by compiling the C modules with >>> -fPIC. This bug is pretty bad, I'd say. I've added myself to its CC >>> list. >> >> Urgh. It's a nasty bug, but not one that we can fix, because it's an >> artifact of the small memory model used on x86_64. The only fix is to >> use -fPIC. >> >> It might be possible to use -fPIC either by default, or perhaps just >> for .c files and when compiling data references from FFI declarations >> in Haskell code, that's something we could look into. We might want >> -fPIC on by default anyway if we switch to using dynamic linking by >> default (but we're not yet sure what ramifications that will have). >> > > Well, my fix is: > > if arch(x86_64) > cc-options: -fPIC > > This only affects the C files we compile of which there are only two at > the moment. I am happy with this solution since I know which files are > affected. > > But basically this bug will hit me whenever I use a global C variable > from within Haskell? I hope there are none that we use, they should all > be accessed using functions, so we should be safe.
A reference to data that resides in a shared library, yes. It's surprising how rarely this happens in fact. Cheers, Simon > Cheers, > Axel > > >> >> Cheers, >> Simon >> >> >> >>> Cheers, >>> Axel >>> >>> On 14.07.2010, at 16:51, Axel Simon wrote: >>> >>>> Hi all, >>>> >>>> I'm trying to debug a segfault relating to the memory management in >>>> Gtk2Hs. Rather than make you read the ticket >>>> http://hackage.haskell.org/trac/gtk2hs/ticket/1183 >>>> , I'll describe the problem: >>>> >>>> - compiler 6.12.1 or 6.12.3 >>>> - darcs head of Gtk2Hs with #define DEBUG instead of #undef DEBUG in >>>> gtk/Graphics/UI/Gtk/General/hsthread.c >>>> - platform Ubuntu Linux, x86-64 >>>> - to reproduce: cd gtk2hs/gtk/demo/hello and run ghci World.hs and >>>> type 'main' >>>> >>>> A window with the "Hello World" button appears. After a few seconds, >>>> the GC runs and the finaliser of the GtkButton is run since the >>>> Haskell program no longer holds a reference to that object (only the >>>> GtkWindow in C land has). >>>> >>>> Thus, the GC calls a C function gtk2hs_g_object_unref_from_mainloop >>>> which is supposed to enqueue the object into a global data structure >>>> from which objects are later taken and g_object_unref is called on >>>> them. >>>> >>>> This global data structure is protected by a mutex, which is >>>> acquired using g_static_mutex_lock: >>>> >>>> void gtk2hs_g_object_unref_from_mainloop(gpointer object) { >>>> >>>> int mutex_locked = 0; >>>> if (threads_initialised) { >>>> #ifdef DEBUG >>>> printf("acquiring lock to add a %s object at %lx\n", >>>> g_type_name(G_OBJECT_TYPE(object)), (unsigned long) >>>> object); >>>> printf("value of lock function is %lx\n", >>>> (unsigned long) >>>> g_thread_functions_for_glib_use.mutex_lock); >>>> #endif >>>> g_rand_new(); >>>> #if defined( WIN32 ) >>>> EnterCriticalSection(>k2hs_finalizer_mutex); >>>> #else >>>> g_static_mutex_lock(>k2hs_finalizer_mutex); >>>> #endif >>>> mutex_locked = 1; >>>> } >>>> [..] >>>> >>>> The program prints: >>>> >>>> acquiring lock to add a GtkButton object at 22d8020 >>>> value of lock function is 0 >>>> zsh: segmentation fault ghci World >>>> >>>> Now the debugging weirdness starts. Whatever I do, I cannot get gdb >>>> to find the symbol gtk2hs_g_object_unref_from_mainloop. >>>> >>>> Since the function above is contained in a C file that comes with >>>> our Haskell library, I tried to add "cc-options: -g" and "cc- >>>> options: -ggdb -O0", but maybe somewhere symbols are stripped. So I >>>> added the bogus function call to "g_rand_new()" which is not called >>>> anywhere else and gdb stops as follows: >>>> >>>> acquiring lock to add a GtkButton object at 2105020 >>>> value of lock function is 0 >>>> [Switching to Thread 0x7ffff41ff710 (LWP 15735)] >>>> >>>> Breakpoint 12, 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/ >>>> libglib-2.0.so >>>> >>>> This all seems reasonable, but: >>>> >>>> (gdb) bt >>>> #0 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/libglib-2.0.so >>>> #1 0x00000000419b3792 in ?? () >>>> #2 0x00007ffff678f078 in ?? () >>>> >>>> i.e. the calling context is broken. I'm very, very sure that the >>>> caller is indeed the above mentioned function and since g_rand_new >>>> isn't called anywhere in my Haskell program (and otherwise the >>>> calling context would be sane). >>>> I'm also passing the address of gtk2hs_g_object_unref_from_mainloop >>>> as FinalizerPtr to all my ForeignPtrs, so there is no inlining going >>>> on. >>>> >>>> Back to the culprit, the call to g_static_mutex_lock. This is a >>>> macro that expands to >>>> >>>> *g_thread_functions_for_glib_use.mutex_lock >>>> >>>> where g_thread_functions_for_glib is a global variable that contains >>>> a lot of function pointers. At the break point, it contains this: >>>> >>>> (gdb) print g_thread_functions_for_glib_use >>>> $33 = {mutex_new = 0x7ffff0cd9820<g_mutex_new_posix_impl>, >>>> mutex_lock = 0x7ffff6c8b3c0<__pthread_mutex_lock>, >>>> mutex_trylock = 0x7ffff0cd97b0<g_mutex_trylock_posix_impl>, >>>> mutex_unlock = 0x7ffff6c8ca00<__pthread_mutex_unlock>, >>>> mutex_free = 0x7ffff0cd9740<g_mutex_free_posix_impl>, >>>> [..] >>>> >>>> So the call to g_mutex_lock should call the function >>>> __pthread_mutex_lock but it calls NULL. >>>> >>>> I hoped that writing this email would give me a bit more insight >>>> into the problem, but for now I suspect that something overwrites >>>> either the stack or the code of the function. >>>> >>>> On the same platform, the compiled version prints: >>>> >>>> acquiring lock to add a GtkButton object at 1b05820 >>>> value of lock function is 7f7adcabd3c0 >>>> within mutex: adding finalizer to a GtkButton object! >>>> >>>> On Mac OS or i386, using ghci or ghc, version 6.10.4, it works as >>>> well. >>>> Now for the fun bit: on i386 using ghci version 6.12.1 it works too. >>>> >>>> So it's an x86-64 and ghc 6.12.1 bug. According to Christian Maeder >>>> who submitted the ticket, the problem persists in 6.12.3. >>>> >>>> Any hints and help appreciated, >>>> Cheers, >>>> Axel >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Glasgow-haskell-users mailing list >>>> glasgow-haskell-us...@haskell.org >>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> This SF.net email is sponsored by Sprint >>> What will you do first with EVO, the first 4G phone? >>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>> _______________________________________________ >>> Gtk2hs-devel mailing list >>> Gtk2hs-devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/gtk2hs-devel >> > ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Gtk2hs-devel mailing list Gtk2hs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/gtk2hs-devel