Explaining the failure pattern; Once a second thread blocked against the old pthread_unlock statement, and a second thread finally released that lock, one of two failure conditions occurred...
1. the original owner had an implicit yield timeslice to the new acquirer of the mutex. That 2nd thread which obtained the pthread_mutex_lock would set it's ownership and initialize the refcount to one. When the original thread regained it's timeslice, it would UNSET the new threads ownership and refcount so the mutex appeared unowned. When the new thread attempted a nested thread lock, it wouldn't recognize the mutex owner, so it would deadlock. 2. On a massively parallel (SMP) box, the original thread releasing the mutex would not yeild. The original and new threads would both race to unset and set the ownership, respectively. This created a somewhat different race pattern. Note the use of memset(&mutex, 0, sizeof mutex) further skewed the behavior by using a very expensive call to unset what is usually a simple pointer or int. Because the new patch protects the uninitalization of the mutex while the lock is still held, the only failure scenario that remains is; 1. thread is interrupted (e.g. signal handler) in between the unsetting of the ownership (and decrement of the refcount) and actually releasing the mutex. The interrupt handler attempts to perform a nested lock and deadlocks because the ownership has already been reset, but the lock is not yet released. This one remaining failure case is far more unlikely than our currently possible host of issues. I don't see a simple workaround to avoid this last failure case. Bill