Hello all, While working on making (ice-9 popen) thread-safe, I've discovered a serious problem with system asyncs and mutexes.
System asyncs can run while mutexes are locked. Asyncs can run arbitrary scheme code, so of course mutexes will often be locked within asyncs as well. So what happens if an async tries to lock a mutex that has already been locked by the same thread? Deadlock, of course. Recursive mutexes are not a solution. They would avoid the deadlock, but they would leave open the possibility of corrupted data structures, because the async might be run while a data structure is in an inconsistent state. If the async tries to access that data structure, things could get ugly. In popen, there are data structures (the port table and the guardian) that need to be locked both outside and within asyncs, so I addressed the problem by blocking asyncs before grabbing the lock: (define-syntax-rule (with-popen-tables-locked e0 e ...) (call-with-blocked-asyncs (lambda () (with-mutex popen-mutex e0 e ...)))) This prevents deadlock by this particular mutex, but what about all the other mutexes used throughout Guile? The deadlock I happen to be seeing during 'make check' is from the 'overrides_lock' in procprop.c, but there are scores of other mutexes around the system that could cause the same problem. It seems to me that system asyncs are a fundamentally flawed concept in any system that uses mutexes. They need to be run in a different thread to prevent these deadlocks. Thoughts? Mark