I've started working through the lock ordering and threading issues that we have. My plan is (with 1.8.x)
- first to address problems reported by helgrind (since I think we should take advantage of external tools before adding debug code to Guile internally) - then to run Linas's define-race program, and address the problems that that shows up (including thread-safe define, hopefully) - then (or maybe as part of the previous step) to look again at Linas's lock order debugging patch, to see if it would make sense to apply some of that. Does that sound sensible; have I missed anything? A patch for the first helgrind problem is attached; any comments would be appreciated, as ever. Regards, Neil
>From 3dcf473e9e535314fc6dedc99ff8d9132e7a3e3e Mon Sep 17 00:00:00 2001 From: Neil Jerram <n...@ossau.uklinux.net> Date: Wed, 11 Feb 2009 21:22:57 +0000 Subject: [PATCH] Lock ordering: don't lock heap_mutex and then thread_admin_mutex This fixes the following helgrind report. Thread #1: lock order "0x4325084 before 0x4105328" violated at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40D01EA: scm_i_thread_put_to_sleep (threads.c:1622) by 0x4078958: scm_make_fluid (fluids.c:114) by 0x40778D6: scm_init_feature (feature.c:101) by 0x408C62E: scm_i_init_guile (init.c:464) by 0x40D1873: scm_i_init_thread_for_guile (threads.c:583) by 0x40D18B4: scm_i_with_guile_and_parent (threads.c:725) by 0x40D19BD: scm_with_guile (threads.c:714) by 0x408C42E: scm_boot_guile (init.c:350) by 0x8048710: main (guile.c:69) Required order was established by acquisition of lock at 0x4325084 at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40CFFA4: guilify_self_1 (threads.c:468) by 0x408C58B: scm_i_init_guile (init.c:421) by 0x40D1873: scm_i_init_thread_for_guile (threads.c:583) by 0x40D18B4: scm_i_with_guile_and_parent (threads.c:725) by 0x40D19BD: scm_with_guile (threads.c:714) by 0x408C42E: scm_boot_guile (init.c:350) by 0x8048710: main (guile.c:69) followed by a later acquisition of lock at 0x4105328 at 0x40234F7: pthread_mutex_lock (hg_intercepts.c:408) by 0x40CFFAC: guilify_self_1 (threads.c:470) by 0x408C58B: scm_i_init_guile (init.c:421) by 0x40D1873: scm_i_init_thread_for_guile (threads.c:583) by 0x40D18B4: scm_i_with_guile_and_parent (threads.c:725) by 0x40D19BD: scm_with_guile (threads.c:714) by 0x408C42E: scm_boot_guile (init.c:350) by 0x8048710: main (guile.c:69) * threads.c (guilify_self_1): Add self to global thread list _before_ entering Guile mode. --- libguile/threads.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/libguile/threads.c b/libguile/threads.c index ba17746..05e01e3 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -465,13 +465,19 @@ guilify_self_1 (SCM_STACKITEM *base) scm_i_pthread_setspecific (scm_i_thread_key, t); - scm_i_pthread_mutex_lock (&t->heap_mutex); - + /* As soon as this thread adds itself to the global thread list, the + GC may think that it has a stack that needs marking. Therefore + initialize t->top to be the same as t->base, just in case GC runs + before the thread can lock its heap_mutex for the first time. */ + t->top = t->base; scm_i_pthread_mutex_lock (&thread_admin_mutex); t->next_thread = all_threads; all_threads = t; thread_count++; scm_i_pthread_mutex_unlock (&thread_admin_mutex); + + /* Enter Guile mode. */ + scm_enter_guile (t); } /* Perform second stage of thread initialisation, in guile mode. -- 1.5.6.5