Here are some benchmarks I performed on a Uniprocessor UltraSparc machine running Solaris 8. The benchmarking code is the same that W. Richard Stevens used in his UNIX Network Programming: Interprocess Communication, Vol 2, Second Edition (See Appendix A, p. 463-466). I invite everyone to perform these tests on their platforms in various configurations (I *really* want to run these tests on a big 8-way sun box :)
Note that *nothing* in these tests will run faster in parallel, so the single concurrency case will be optimal. This is good because it means these tests maximally reflect the performance of the underlying synchronization mechanisms, and are minimally scewed by the ability of the machine to do the basic operation that we are serializing. The numbers are all in seconds. Each test was performed 3 times and averaged. The tests themselves consist of a number of concurrent workers (threads or processes), each of which contends for a mutex. Once the mutex is acquired the active thread simply increments a counter and unlocks. When the counter reaches 1 million, the process prints the time delta and exits. Multithreaded Results (aka PROCESS_PRIVATE) ------------------------------------------------------------------------- Lock Mechanism Concurrency Total time (sec) ============== =========== ========== pthread_mutex 1 0.4 pthread_mutex 2 0.7 pthread_mutex 3 1.1 pthread_mutex 4 1.5 pthread_mutex 5 1.8 pthread_rwlock 1 0.9 pthread_rwlock 2 1.9 pthread_rwlock 3 3.1 pthread_rwlock 4 4.5 pthread_rwlock 5 8.4 posix memory-based sem. 1 2.7 posix memory-based sem. 2 5.4 posix memory-based sem. 3 8.1 posix memory-based sem. 4 10.8 posix memory-based sem. 5 13.5 posix named sem. 1 7.5 posix named sem. 2 15.1 posix named sem. 3 22.7 posix named sem. 4 30.6 posix named sem. 5 38.5 SysV sem. 1 4.0 SysV sem. 2 8.6 SysV sem. 3 12.5 SysV sem. 4 16.5 SysV sem. 5 21.0 SysV sem. w/ UNDO 1 4.7 SysV sem. w/ UNDO 2 9.5 SysV sem. w/ UNDO 3 14.5 SysV sem. w/ UNDO 4 19.1 SysV sem. w/ UNDO 5 23.8 fcntl() 1 15.4 [thread concurrency greater than 1 on Solaris is not possible, since fcntl() can only lock between processes, not between threads in the same process. See below for the multiprocess fcntl() results.] Multiprocess Results (aka PROCESS_SHARED) ------------------------------------------------------------------------- Lock Mechanism Concurrency Total time (sec) ============== =========== ========== pthread_mutex 1 0.4 pthread_mutex 2 0.8 pthread_mutex 3 1.1 pthread_mutex 4 1.4 pthread_mutex 5 1.8 pthread_rwlock 1 0.8 pthread_rwlock 2 1.5 pthread_rwlock 3 2.6 pthread_rwlock 4 4.3 pthread_rwlock 5 6.2 posix memory-based sem. 1 7.4 posix memory-based sem. 2 14.9 posix memory-based sem. 3 22.6 posix memory-based sem. 4 29.6 posix memory-based sem. 5 37.2 posix named sem. 1 7.7 posix named sem. 2 14.9 posix named sem. 3 22.4 posix named sem. 4 29.9 posix named sem. 5 37.4 SysV sem. 1 4.1 SysV sem. 2 8.4 SysV sem. 3 12.0 SysV sem. 4 16.1 SysV sem. 5 20.3 SysV sem. w/ UNDO 1 5.0 SysV sem. w/ UNDO 2 9.8 SysV sem. w/ UNDO 3 14.4 SysV sem. w/ UNDO 4 19.3 SysV sem. w/ UNDO 5 23.7 fcntl() 1 15.4 fcntl() 2 40.6 fcntl() 3 61.2 fcntl() 4 89.0 fcntl() 5 118.8 [Note: the lock file used here was in the /tmp directory. Lock files on a non-RAM based filesystem were significantly slower, and lock files on an NFS partition was even worse than that.] Commentary: --------------------- >>From the perspective of APR, choosing the correct underlying lock mechanism can be very difficult. Trying to match a general-use mutual exclusion mechanism to a particular platform with a particular configuration may be too many variables to deal with at build-time (or even run-time). I'm not making any assertions here about which locking mechanisms we should or should not be using, but I think we should gather some more data and revisit this problem. When we look at this merely from the perspective of solving the accept() mutex problem in httpd, we have fewer variables to deal with (CROSS_PROCESS vs. LOCKALL), but the essence of the problem still remains. The above results don't reflect other versions of Solaris, nor do they reflect what happens on a parallel processor machine. My hope is that this will give us something to chew on for awhile. -aaron
