Philip Martin <[EMAIL PROTECTED]> writes: > Jeff Trawick <[EMAIL PROTECTED]> writes: > > > I think the problem is in the test programs (e.g., testprocmutex). As > > soon as one child hits the specified number of iterations, the child > > will exit and something like this will happen: > > > > #0 proc_mutex_sysv_cleanup (mutex_=0x804e608) at proc_mutex.c:205 > > #1 0x4002dfa1 in run_cleanups (c=0x804e660) at apr_pools.c:1973 > > #2 0x4002d59e in apr_pool_destroy (pool=0x804e4f8) at apr_pools.c:755 > > #3 0x4002d58a in apr_pool_destroy (pool=0x804a4e8) at apr_pools.c:752 > > #4 0x4002d24a in apr_pool_terminate () at apr_pools.c:585 > > #5 0x4002a45d in apr_terminate () at start.c:117 > > > > And of course as soon as semctl(IPC_RMID) is done, the lock is broken. > > No question in my mind. The problem is that Aaron cannot reproduce > it...
not a problem at all that Aaron can't hit the problem :) it is clearly broken just by inspection even with my setup, which never showed testprocmutex failing, strace shows the mutex getting cleaned up by one process while other processes are still trying to use them... and we know that broken mutexes *may* lead to the sort of bogosity that testprocmutex checks for, but the counter doesn't automatically get out of sync just because the mutexes aren't used, so testprocmutex will often *not* realize the snafu... [pid 21884] semop(4915210, 0x400315b2, 1 <unfinished ...> [pid 21881] semctl(4915210, 0, 0x100 /* SEM_??? */, 0xbffff598 <unfinished ...> [pid 21884] <... semop resumed> ) = 0 [pid 21883] <... semop resumed> ) = -1 EIDRM (Identifier removed) [pid 21881] <... semctl resumed> ) = 0 [pid 21884] semop(4915210, 0x400315ac, 1 <unfinished ...> [pid 21882] <... semop resumed> ) = -1 EIDRM (Identifier removed) [pid 21881] shmctl(11698257, 0x100 /* SHM_??? */, 0 <unfinished ...> [pid 21884] <... semop resumed> ) = -1 EINVAL (Invalid argument) -- Jeff Trawick | [EMAIL PROTECTED] Born in Roswell... married an alien...