Okay, I think I know what the problem is: Part of the SRFI-18 thread start / creation process involves contention for a mutex, and there's a bug in fat_mutex_lock code that causes the locking thread to sometimes miss an unlocking thread's notification that a mutex is available. So it's actually a mutex bug -- specifically, in the loop code in fat_mutex_lock that ends with the following snippet:
... scm_i_pthread_mutex_unlock (&m->lock); SCM_TICK; scm_i_scm_pthread_mutex_lock (&m->lock); } block_self (m->waiting, mutex, &m->lock, timeout); ...which means that if the loop is entered while the mutex is still locked but the owner unlocks it after the locking thread releases the administrative lock to run the tick, the locking thread will sleep forever because it doesn't re-check the state of the mutex. I've made a small change (blocking before doing the tick instead of after) that seems to resolve the issue (so far no lock-ups using Han-Wen's x.test for a couple of hours). There's a patch attached. (Sorry, should have noticed this earlier; the problem existed before the changes I introduced to support SRFI-18...) Regards, Julian On Wed, Aug 27, 2008 at 9:14 AM, Julian Graham <[EMAIL PROTECTED]> wrote: >> I've seen `srfi-18.test' hang from time to time, but not often enough to >> give me an incentive to nail it down. :-( I don't think it relates to >> Han-Wen's GC changes. > > > Crap, I'm seeing some lockups now, too. Sorry, guys. I'm debugging, > but don't let that stop you from investigating as well. ;)
From 12a5c7ca5e0bec9386ee24b2e2320d1aa03e55d5 Mon Sep 17 00:00:00 2001 From: Julian Graham <[EMAIL PROTECTED](none)> Date: Sat, 30 Aug 2008 19:03:21 -0400 Subject: [PATCH] Resolve a deadlock caused by not checking mutex state after calling `SCM_TICK'. --- libguile/ChangeLog | 5 +++++ libguile/threads.c | 2 +- 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/libguile/ChangeLog b/libguile/ChangeLog index e8d9362..40e2bb4 100644 --- a/libguile/ChangeLog +++ b/libguile/ChangeLog @@ -1,3 +1,8 @@ +2008-08-29 Julian Graham <[EMAIL PROTECTED]> + + * threads.c (fat_mutex_lock): Resolve a deadlock caused by not + checking mutex state after calling `SCM_TICK'. + 2008-08-27 Ludovic Courtès <[EMAIL PROTECTED]> Fix builds `--without-threads'. Reported by Han-Wen Nienhuys diff --git a/libguile/threads.c b/libguile/threads.c index 7e55f3b..8699fd0 100644 --- a/libguile/threads.c +++ b/libguile/threads.c @@ -1292,11 +1292,11 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret) break; } } + block_self (m->waiting, mutex, &m->lock, timeout); scm_i_pthread_mutex_unlock (&m->lock); SCM_TICK; scm_i_scm_pthread_mutex_lock (&m->lock); } - block_self (m->waiting, mutex, &m->lock, timeout); } scm_i_pthread_mutex_unlock (&m->lock); return err; -- 1.5.4.3