Comments below...
On Wednesday, July 30, 2003, at 09:21 AM, William A. Rowe, Jr. wrote:
Attached is a patch that appears to be as close as thread-safe (but altogether not reentrant) as I can accomplish.
The existing unix/thread_mutex.c code is neither thread-safe nor reentrant when using the 'nested' (default) thread mutexes. The bug is most often expressed on SMP boxes, where they are so massively parallel that the subtle flaws of modifying the mutex object AFTER we have released the lock become very apparent.
Examples include a number of error (45) Deadlock Avoided alerts using Apache2/mod_ssl on Solaris, where there are both internal thread locks and global locks which contain a nested thread lock.
Many more eyes on this code would be welcomed, I expect to discover that we repeat this mistake elsewhere in the mutex code...
Index: srclib/apr/thread_mutex.c =================================================================== RCS file: /home/cvs/apr/locks/unix/thread_mutex.c,v retrieving revision 1.18 diff -u -r1.18 thread_mutex.c --- srclib/apr/thread_mutex.c 8 Jun 2003 13:42:20 -0000 1.18 +++ srclib/apr/thread_mutex.c 30 Jul 2003 16:09:41 -0000 @@ -108,8 +108,11 @@ apr_status_t rv;
if (mutex->nested) { + /* + * WARNING: although threadsafe, this test is NOT reentrant + */
I don't know what you mean by reentrant here, could you explain?
if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) {
- mutex->owner_ref++;
+ ++mutex->owner_ref;
Does this make a difference? (I think an incr is not threadsafe any way we
look at it, but I could be wrong (It's really a load/incr/store, right?).)
return APR_SUCCESS; }
@@ -121,8 +124,13 @@ return rv; }
+ if (++mutex->owner_ref != 1) {
+ /* The owner_ref should be zero when the lock is not held,
+ * if owner_ref was non-zero we have a mutex reference bug.
+ */
+ mutex->owner_ref = 1;
+ }
mutex->owner = apr_os_thread_current();
- mutex->owner_ref = 1;
return rv;
}
else {
@@ -141,8 +149,11 @@
apr_status_t rv;
if (mutex->nested) {
+ /*
+ * WARNING: although threadsafe, this test is NOT reentrant
+ */
if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) {
- mutex->owner_ref++;
+ ++mutex->owner_ref;
return APR_SUCCESS;
}
@@ -154,8 +165,13 @@ return (rv == EBUSY) ? APR_EBUSY : rv; }
+ if (++mutex->owner_ref != 1) {
+ /* The owner_ref should be zero when the lock is not held,
+ * if owner_ref was non-zero we have a mutex reference bug.
+ */
+ mutex->owner_ref = 1;
+ }
mutex->owner = apr_os_thread_current();
- mutex->owner_ref = 1;
}
else {
rv = pthread_mutex_trylock(&mutex->mutex);
@@ -176,31 +192,30 @@
if (mutex->nested) {
if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) {
- mutex->owner_ref--;
- if (mutex->owner_ref > 0)
+ /*
+ * WARNING: although threadsafe, this test is NOT reentrant
+ */
+ if (--mutex->owner_ref > 0)
return APR_SUCCESS;
}
- status = pthread_mutex_unlock(&mutex->mutex);
- if (status) {
-#ifdef PTHREAD_SETS_ERRNO
- status = errno;
-#endif
- return status;
- }
+ mutex->owner = 0;
- memset(&mutex->owner, 0, sizeof mutex->owner);
- mutex->owner_ref = 0;
- return status;
+ if (mutex->owner_ref < 0) {
+ /* The owner_ref should be zero when the lock is finally released,
+ * if owner_ref is non-zero we have a mutex reference bug.
+ */
+ mutex->owner_ref = 0;
+ }
}
- else {
- status = pthread_mutex_unlock(&mutex->mutex);
+
+ status = pthread_mutex_unlock(&mutex->mutex);
#ifdef PTHREAD_SETS_ERRNO
- if (status) {
- status = errno;
- }
-#endif
- return status;
+ if (status) {
+ status = errno;
}
+#endif
+
+ return status;
}
APR_DECLARE(apr_status_t) apr_thread_mutex_destroy(apr_thread_mutex_t *mutex)
This last chunk seems to be the meat of at least one of the problems.
Still, even though this change is good in that we only mess with
the mutex->owner stuff when the lock is held, we aren't protecting
mutex->owner in other places (eg. in the places above that you pointed out)
Do we really need nested mutexes? The correct solution might be to add another
internal pthread_mutex that only gets used when we want nested mutexes,
and only gets locked when we want to read or modify those mutex->owner
variables. This might make things a little slower than they would be
now (only slightly), but it would be correct if nested mutexes are
really a necessary feature.
-aaron