On 16/07/2010 12:36, Axel Simon wrote:
Dear Haskell maintainers,

I've progressed a little and found that the problem is down to
accessing global variables that are declared in dynamic libraries. In
a nutshell, this doesn't as the addresses of these global variables
are all wrong when ghci is executing the code. So, I think I hit:

http://hackage.haskell.org/trac/ghc/ticket/781

I was able to work around this problem by compiling the C modules with
-fPIC. This bug is pretty bad, I'd say. I've added myself to its CC
list.

Urgh. It's a nasty bug, but not one that we can fix, because it's an artifact of the small memory model used on x86_64. The only fix is to use -fPIC.

It might be possible to use -fPIC either by default, or perhaps just for .c files and when compiling data references from FFI declarations in Haskell code, that's something we could look into. We might want -fPIC on by default anyway if we switch to using dynamic linking by default (but we're not yet sure what ramifications that will have).

Cheers,
        Simon



Cheers,
Axel

On 14.07.2010, at 16:51, Axel Simon wrote:

Hi all,

I'm trying to debug a segfault relating to the memory management in
Gtk2Hs. Rather than make you read the ticket 
http://hackage.haskell.org/trac/gtk2hs/ticket/1183
  , I'll describe the problem:

- compiler 6.12.1 or 6.12.3
- darcs head of Gtk2Hs with #define DEBUG instead of #undef DEBUG in
gtk/Graphics/UI/Gtk/General/hsthread.c
- platform Ubuntu Linux, x86-64
- to reproduce: cd gtk2hs/gtk/demo/hello and run ghci World.hs and
type 'main'

A window with the "Hello World" button appears. After a few seconds,
the GC runs and the finaliser of the GtkButton is run since the
Haskell program no longer holds a reference to that object (only the
GtkWindow in C land has).

Thus, the GC calls a C function gtk2hs_g_object_unref_from_mainloop
which is supposed to enqueue the object into a global data structure
from which objects are later taken and g_object_unref is called on
them.

This global data structure is protected by a mutex, which is
acquired using g_static_mutex_lock:

void gtk2hs_g_object_unref_from_mainloop(gpointer object) {

  int mutex_locked = 0;
  if (threads_initialised) {
#ifdef DEBUG
      printf("acquiring lock to add a %s object at %lx\n",
             g_type_name(G_OBJECT_TYPE(object)), (unsigned long)
object);
      printf("value of lock function is %lx\n",
             (unsigned long)
g_thread_functions_for_glib_use.mutex_lock);
#endif
    g_rand_new();
#if defined( WIN32 )
    EnterCriticalSection(&gtk2hs_finalizer_mutex);
#else
    g_static_mutex_lock(&gtk2hs_finalizer_mutex);
#endif
    mutex_locked = 1;
  }
[..]

The program prints:

acquiring lock to add a GtkButton object at 22d8020
value of lock function is 0
zsh: segmentation fault  ghci World

Now the debugging weirdness starts. Whatever I do, I cannot get gdb
to find the symbol gtk2hs_g_object_unref_from_mainloop.

Since the function above is contained in a C file that comes with
our Haskell library, I tried to add "cc-options: -g" and "cc-
options: -ggdb -O0", but maybe somewhere symbols are stripped. So I
added the bogus function call to "g_rand_new()" which is not called
anywhere else and gdb stops as follows:

acquiring lock to add a GtkButton object at 2105020
value of lock function is 0
[Switching to Thread 0x7ffff41ff710 (LWP 15735)]

Breakpoint 12, 0x00007ffff115bfa0 in g_rand_new () from /usr/lib/
libglib-2.0.so

This all seems reasonable, but:

(gdb) bt
#0  0x00007ffff115bfa0 in g_rand_new () from /usr/lib/libglib-2.0.so
#1  0x00000000419b3792 in ?? ()
#2  0x00007ffff678f078 in ?? ()

i.e. the calling context is broken. I'm very, very sure that the
caller is indeed the above mentioned function and since g_rand_new
isn't called anywhere in my Haskell program (and otherwise the
calling context would be sane).
I'm also passing the address of gtk2hs_g_object_unref_from_mainloop
as FinalizerPtr to all my ForeignPtrs, so there is no inlining going
on.

Back to the culprit, the call to g_static_mutex_lock. This is a
macro that expands to

*g_thread_functions_for_glib_use.mutex_lock

where g_thread_functions_for_glib is a global variable that contains
a lot of function pointers. At the break point, it contains this:

(gdb) print g_thread_functions_for_glib_use
$33 = {mutex_new = 0x7ffff0cd9820<g_mutex_new_posix_impl>,
  mutex_lock = 0x7ffff6c8b3c0<__pthread_mutex_lock>,
  mutex_trylock = 0x7ffff0cd97b0<g_mutex_trylock_posix_impl>,
  mutex_unlock = 0x7ffff6c8ca00<__pthread_mutex_unlock>,
  mutex_free = 0x7ffff0cd9740<g_mutex_free_posix_impl>,
[..]

So the call to g_mutex_lock should call the function
__pthread_mutex_lock but it calls NULL.

I hoped that writing this email would give me a bit more insight
into the problem, but for now I suspect that something overwrites
either the stack or the code of the function.

On the same platform, the compiled version prints:

acquiring lock to add a GtkButton object at 1b05820
value of lock function is 7f7adcabd3c0
within mutex: adding finalizer to a GtkButton object!

On Mac OS or i386, using ghci or ghc, version 6.10.4, it works as
well.
Now for the fun bit: on i386 using ghci version 6.12.1 it works too.

So it's an x86-64 and ghc 6.12.1 bug. According to Christian Maeder
who submitted the ticket, the problem persists in 6.12.3.

Any hints and help appreciated,
Cheers,
Axel







_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gtk2hs-devel mailing list
gtk2hs-de...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/gtk2hs-devel

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to