Okay, I think I know what the problem is: Part of the SRFI-18 thread
start / creation process involves contention for a mutex, and there's
a bug in fat_mutex_lock code that causes the locking thread to
sometimes miss an unlocking thread's notification that a mutex is
available.  So it's actually a mutex bug -- specifically, in the loop
code in fat_mutex_lock that ends with the following snippet:

      ...
          scm_i_pthread_mutex_unlock (&m->lock);
          SCM_TICK;
          scm_i_scm_pthread_mutex_lock (&m->lock);
        }
      block_self (m->waiting, mutex, &m->lock, timeout);

...which means that if the loop is entered while the mutex is still
locked but the owner unlocks it after the locking thread releases the
administrative lock to run the tick, the locking thread will sleep
forever because it doesn't re-check the state of the mutex.  I've made
a small change (blocking before doing the tick instead of after) that
seems to resolve the issue (so far no lock-ups using Han-Wen's x.test
for a couple of hours).  There's a patch attached.

(Sorry, should have noticed this earlier; the problem existed before
the changes I introduced to support SRFI-18...)


Regards,
Julian


On Wed, Aug 27, 2008 at 9:14 AM, Julian Graham <[EMAIL PROTECTED]> wrote:
>> I've seen `srfi-18.test' hang from time to time, but not often enough to
>> give me an incentive to nail it down.  :-(  I don't think it relates to
>> Han-Wen's GC changes.
>
>
> Crap, I'm seeing some lockups now, too.  Sorry, guys.  I'm debugging,
> but don't let that stop you from investigating as well.  ;)
From 12a5c7ca5e0bec9386ee24b2e2320d1aa03e55d5 Mon Sep 17 00:00:00 2001
From: Julian Graham <[EMAIL PROTECTED](none)>
Date: Sat, 30 Aug 2008 19:03:21 -0400
Subject: [PATCH] Resolve a deadlock caused by not checking mutex state after calling
 `SCM_TICK'.

---
 libguile/ChangeLog |    5 +++++
 libguile/threads.c |    2 +-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/libguile/ChangeLog b/libguile/ChangeLog
index e8d9362..40e2bb4 100644
--- a/libguile/ChangeLog
+++ b/libguile/ChangeLog
@@ -1,3 +1,8 @@
+2008-08-29  Julian Graham  <[EMAIL PROTECTED]>
+
+	* threads.c (fat_mutex_lock): Resolve a deadlock caused by not
+	checking mutex state after calling `SCM_TICK'.	
+
 2008-08-27  Ludovic Courtès  <[EMAIL PROTECTED]>
 
 	Fix builds `--without-threads'.  Reported by Han-Wen Nienhuys
diff --git a/libguile/threads.c b/libguile/threads.c
index 7e55f3b..8699fd0 100644
--- a/libguile/threads.c
+++ b/libguile/threads.c
@@ -1292,11 +1292,11 @@ fat_mutex_lock (SCM mutex, scm_t_timespec *timeout, SCM owner, int *ret)
 		  break;
 		}
 	    }
+	  block_self (m->waiting, mutex, &m->lock, timeout);
 	  scm_i_pthread_mutex_unlock (&m->lock);
 	  SCM_TICK;
 	  scm_i_scm_pthread_mutex_lock (&m->lock);
 	}
-      block_self (m->waiting, mutex, &m->lock, timeout);
     }
   scm_i_pthread_mutex_unlock (&m->lock);
   return err;
-- 
1.5.4.3

Reply via email to