Philip Martin <[EMAIL PROTECTED]> writes: > The first problem is the line > if (apr_os_thread_equal(mutex->owner, apr_os_thread_current())) { > where there is access to the shared data mutex->owner without any sort > of synchronization. Now mutex->owner may not be an atomic type, in > which case a totally bogus value could be obtained, and if it is an > atomic type the unsynchronized access is still pointless. > > However the real problem is that this is supposed to be a *process* > lock, and yet it is comparing *thread* IDs. Thread IDs are distinct > within a process, but there is no guarantee that they are distinct > across mutiple processes. Comparing thread IDs from two separate > processes is undefined behaviour when using POSIX threads. There is > at least one common platform (GNU glibc threads) where thread IDs are > duplicated. When I run APR's testprocmutex test on a 2-way SMP Linux > box it regularly fails (that it doesn't always fail is, I suspect, > because it is not a particularly good test and so the processes often > complete without any mutex contention).
I've looked at the proc mutex code again, and things are less clear. I see now that apr_proc_mutex_create and apr_proc_mutex_unlock both set the mutex->owner field to zero to indicate an invalid mutex. This is not valid for a POSIX thread system because a) zero may be a valid thread ID, and b) passing an "invented" thread ID to pthread_equal is undefined behaviour. However it may well work on a Linux glibc 2.2.5 system where I believe pthread_t is an unsigned long and zero is not used as a thread ID. It also means that my complaint about comparing thread IDs from different processes does not apply (I was assuming mutex->owner was initialized to the thread ID of the thread that created the proc mutex, as that is the only available valid thread ID). Despite the problems with the proc mutex code I cannot identify one which would cause the test failure I am seeing. I'm running the test on an SMP (dual P3) Linux machine and it fails about one run in three. Here is the output of a typical failure $ ./testprocmutex APR Proc Mutex Test ============== Exclusive lock test Initializing the lock OK Starting all of the processes OK Waiting for processes to exit OK Locks don't appear to work! x = 15998 instead of 16000 The test is multi-process, but the processes are single threaded. I guess the problem lies somewhere in the semaphores, but I don't have any experience of using those. -- Philip Martin