PThread cond_wait bug w/nested mutex

Mark Gooderum 22 Aug 2003 10:43:31 -0000

I've discovered what I think it a bug in the pthreads based cond_wait() code when used with nested mutexes.

The "proper" (per my experience) way to use a condition variable in the pthreads world is a squence like:Waiter Notifier ---------- ------------- mutex_lock cond_wait (pthreads atomically surrenders lock and waits) mutex_lock cond_signal mutex_unlock (pthread atomically requires the mutex and wakes up) mutex_unlock
But if the waiter code is for instance in a library and is called with the nested mutex already locked the lock/cond_wait/unlock cycle always clears the mutex. So a cycle like:

Waiter Notifier ---------- ------------- mutex_lock mutex_lock cond_wait (pthreads atomically surrenders lock and waits) mutex_lock cond_signal mutex_unlock (pthread atomically requires the mutex and wakes up) mutex_unlock mutex_unlock

is broken - Details follow - what happens is the mutex_lock/unlock cycle in the notifier trashes the APR portions of the mutex.

APR_DECLARE(apr_status_t) apr_thread_cond_wait(apr_thread_cond_t *cond, apr_thread_mutex_t *mutex){ apr_status_t rv; rv = pthread_cond_wait(cond->cond, &mutex->mutex);#ifdef PTHREAD_SETS_ERRNO if (rv) { rv = errno; }#endif}

The owner and owner_ref fields of the APR mutex structure are left alone. The caller does a mutex_lock which works and destroys the owner/owner_ref fields:

APR_DECLARE(apr_status_t) apr_thread_mutex_lock(apr_thread_mutex_t *mutex)
{
    apr_status_t rv;

    if (mutex->nested) {
---->This doesn't hit because owner is the other waiting thread:
        if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) {
            mutex->owner_ref++;
            return APR_SUCCESS;
        }
--->So we succeed in "stealing" the mutex
        rv = pthread_mutex_lock(&mutex->mutex);
        if (rv) {
#ifdef PTHREAD_SETS_ERRNO
            rv = errno;
#endif
            return rv;
        }

        mutex->owner = apr_os_thread_current();
        mutex->owner_ref = 1;
        return rv;
    }
    else {
        rv = pthread_mutex_lock(&mutex->mutex);
#ifdef PTHREAD_SETS_ERRNO
        if (rv) {
            rv = errno;
        }
#endif
        return rv;
    }
}

When the notifier than calls mutex_unlock() the pthread mutex is released and the mutex owner/owner_ref fields zeroed out.

So now the waiter thread gets the pthread mutex inside pthread_cond_wait() and wakes up. The call returns to the caller. The caller then trys to release the mutex - this call succeeds because we always release the mutex. But the mutex is now fully released - so if the caller is counting on holding the mutex with nesting he's hosed.

APR_DECLARE(apr_status_t) apr_thread_mutex_unlock(apr_thread_mutex_t *mutex)
{
    apr_status_t status;

    if (mutex->nested) {
        ---->This fails so it just releases the pthread_mutex (incorrectly if my original nest count was > 1)
        if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) {
            mutex->owner_ref--;
            if (mutex->owner_ref > 0)
                return APR_SUCCESS;
        }
        status = pthread_mutex_unlock(&mutex->mutex);
        if (status) {
#ifdef PTHREAD_SETS_ERRNO
            status = errno;
#endif
            return status;
        }

        memset(&mutex->owner, 0, sizeof mutex->owner);
        mutex->owner_ref = 0;
        return status;
    }
    else {
        status = pthread_mutex_unlock(&mutex->mutex);
#ifdef PTHREAD_SETS_ERRNO
        if (status) {
            status = errno;
        }
#endif
        return status;
    }
}

The fix seems to be to save and restore the owner/owner_ref fields around the pthread_cond_wait() call. The same thing applies to the timed version of the wait but I used the shorter version here for clarity. P.S.. - I'm not subscribed to the dev list so please CC me on replies, thanks.
--
Mark Gooderum
Vernier Networks, Inc.
[EMAIL PROTECTED]

PThread cond_wait bug w/nested mutex

Reply via email to