Looks like a bug to me. Any reason we need nested locks on cross-process mutexes? I'm tempted to just rip out that part and see what breaks.
-aaron On Sat, Nov 16, 2002 at 11:36:20PM +0000, Philip Martin wrote: > Hello > > The following issue was raised in the Subversion issue tracker > > http://subversion.tigris.org/issues/show_bug.cgi?id=996 > > "While tracking down Bug 995, I've found out that apr's testglobalmutex > and testprocmutex are failing for me when running in SMP mode on a > Dual P3. The same tests do not fail when running in Single Processor > mode or running on my Dual Athlon in SMP mode. I am not certain if > this related to Bug 995, but it does appear that httpd is going into a > deadlock and the subversion cmdline client eventually times out. I've > also noticed that apache leaks semaphores when this happens (from > ipcs)." > > I'm not really sure why the submitter chose to report it to > Subversion, and not (as far as I can tell) to APR, but I took a look > at the proc_mutex code and there does appear to be a bug. > > The documentation for apr_proc_mutex_create states that a proc mutex > is intended to synchronize processes. Consider > > APR_DECLARE(apr_status_t) apr_proc_mutex_lock(apr_proc_mutex_t *mutex) > { > apr_status_t rv; > > #if APR_HAS_THREADS > if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) { > mutex->owner_ref++; > return APR_SUCCESS; > } > #endif > > if ((rv = mutex->meth->acquire(mutex)) != APR_SUCCESS) { > return rv; > } > > #if APR_HAS_THREADS > mutex->owner = apr_os_thread_current(); > mutex->owner_ref = 1; > #endif > > return APR_SUCCESS; > } > > The first problem is the line > if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) { > where there is access to the shared data mutex->owner without any sort > of synchronization. Now mutex->owner may not be an atomic type, in > which case a totally bogus value could be obtained, and if it is an > atomic type the unsynchronized access is still pointless. > > However the real problem is that this is supposed to be a *process* > lock, and yet it is comparing *thread* IDs. Thread IDs are distinct > within a process, but there is no guarantee that they are distinct > across mutiple processes. Comparing thread IDs from two separate > processes is undefined behaviour when using POSIX threads. There is > at least one common platform (GNU glibc threads) where thread IDs are > duplicated. When I run APR's testprocmutex test on a 2-way SMP Linux > box it regularly fails (that it doesn't always fail is, I suspect, > because it is not a particularly good test and so the processes often > complete without any mutex contention). > > -- > Philip Martin