System asyncs and mutexes: a combination prone to deadlocks

Mark H Weaver Mon, 19 Aug 2013 20:01:44 -0700

Hello all,

While working on making (ice-9 popen) thread-safe, I've discovered a
serious problem with system asyncs and mutexes.


System asyncs can run while mutexes are locked.  Asyncs can run
arbitrary scheme code, so of course mutexes will often be locked within
asyncs as well.  So what happens if an async tries to lock a mutex that
has already been locked by the same thread?  Deadlock, of course.

Recursive mutexes are not a solution.  They would avoid the deadlock,
but they would leave open the possibility of corrupted data structures,
because the async might be run while a data structure is in an
inconsistent state.  If the async tries to access that data structure,
things could get ugly.

In popen, there are data structures (the port table and the guardian)
that need to be locked both outside and within asyncs, so I addressed
the problem by blocking asyncs before grabbing the lock:

  (define-syntax-rule (with-popen-tables-locked e0 e ...)
    (call-with-blocked-asyncs
     (lambda ()
       (with-mutex popen-mutex e0 e ...))))

This prevents deadlock by this particular mutex, but what about all the
other mutexes used throughout Guile?

The deadlock I happen to be seeing during 'make check' is from the
'overrides_lock' in procprop.c, but there are scores of other mutexes
around the system that could cause the same problem.

It seems to me that system asyncs are a fundamentally flawed concept in
any system that uses mutexes.  They need to be run in a different thread
to prevent these deadlocks.

   Thoughts?
      Mark

System asyncs and mutexes: a combination prone to deadlocks

Reply via email to