Thanks Rainer for the detailed report (as usual). On Mon, Apr 17, 2017 at 2:01 PM, Rainer Jung <rainer.j...@kippdata.de> wrote: > > c) Failure to compile apr on Solaris 8 > -------------------------------------- > > Using gcc 4.1.2 compiling locks/unix/proc_mutex.c: > > In function 'proc_mutex_pthread_create': > 607: error: lvalue required as left operand of assignment > In function 'proc_mutex_pthread_acquire_ex': > 711: error: lvalue required as left operand of assignment > 790: warning: implicit declaration of function 'pthread_mutex_timedlock' > In function 'proc_mutex_pthread_release': > 868: error: lvalue required as left operand of assignment > In function 'proc_mutex_pthread_cond_create': > 945: error: lvalue required as left operand of assignment > > Concerning pthread_mutex_timedlock: that platform has > HAVE_PTHREAD_CONDATTR_SETPSHARED set to 1 but HAVE_PTHREAD_MUTEX_TIMEDLOCK > is not defined (and pthread_mutex_timedlock not available). So > APR_USE_PROC_PTHREAD_MUTEX_COND is defined. So in line 688 we enter the > "#else" branch which ends in line 815. That code includes a call to > pthread_mutex_timedlock. > > Concerning the "lvalue required as left operand of assignment" the lines are > always of the form > > proc_pthread_mutex_cond_locked(SOME_MUTEX) = SOME_INT; > > and proc_pthread_mutex_cond_locked(m) is defined as > > ((m)->pthread_refcounting ? proc_pthread_cast(m)->cond_locked : -1) > > which indeed doesn't look like a good lvalue.
Sorry about that, I thought I had tested the APR_USE_PROC_PTHREAD_MUTEX_COND case with some #undef's on Linux but it seems I missed it for my latest changes. All the above hopefuly fixes with r1791718 and r1791728, both backported to 1.6.x. > > d) Hang during APR make check on Solaris 10 (testprocmutex) > ----------------------------------------------------------- > > pthread_mutex_timedlock() hangs when the current thread already has locked > the mutex. > > GDB says: > > #0 0xff14ca00 in ___lwp_mutex_timedlock () from /lib/libc.so.1 > No symbol table info available. > #1 0xff13fde8 in mutex_lock_kernel () from /lib/libc.so.1 > No symbol table info available. > #2 0xff140f4c in stall () from /lib/libc.so.1 > No symbol table info available. > #3 0xff1416dc in mutex_lock_internal () from /lib/libc.so.1 > No symbol table info available. > #4 0xff141cc0 in pthread_mutex_timedlock () from /lib/libc.so.1 > No symbol table info available. > #5 0xff369310 in proc_mutex_pthread_acquire_ex (mutex=0xb4ff8, > timeout=1492421091353146) > at .../locks/unix/proc_mutex.c:790 > abstime = {tv_sec = 1492421091, tv_nsec = 353146000} > rv = <optimized out> > #6 0xff36a500 in apr_proc_mutex_timedlock (mutex=0xb4ff8, timeout=1) at > .../locks/unix/proc_mutex.c:1558 > No locals. > #7 0x00029f90 in test_exclusive (lockname=0x0, mech=0xffbffad8, > tc=0xffbffa70) at .../test/testprocmutex.c:197 > child = {0xb5118, 0xb5128, 0xb5138, 0xb5148, 0xb5158, 0xb5168} > rv = 0 > n = 0 > #8 proc_mutex (tc=0xffbffa70, data=0xffbffad8) at > .../test/testprocmutex.c:243 > rv = <optimized out> > shmname = 0x37ad0 "tpm.shm" > shm = 0xb4fc8 > #9 0x00015a4c in abts_run_test (ts=ts@entry=0xb7b70, f=f@entry=0x29ccc > <proc_mutex>, value=value@entry=0xffbffad8) > at .../test/abts.c:171 > tc = {failed = 0, suite = 0xb76a8} > ss = 0xb76a8 > #10 0x0002a26c in testprocmutex (suite=0xb7b70) at > .../test/testprocmutex.c:274 > lockmechs = {{num = APR_LOCK_DEFAULT, name = 0x37d70 "default"}, > {num = APR_LOCK_SYSVSEM, name = 0x37d78 "sysvsem"}, {num = > APR_LOCK_POSIXSEM, name = 0x37d80 "posix"}, > {num = APR_LOCK_FCNTL, name = 0x37d88 "fcntl"}, {num = > APR_LOCK_PROC_PTHREAD, name = 0x37d90 "proc_pthread"}, {num = > APR_LOCK_DEFAULT_TIMED, > name = 0x37da0 "default_timed"}} > #11 0x000336c0 in main (argc=2, argv=<optimized out>) at .../test/abts.c:429 > i = <optimized out> > list_provided = <optimized out> > suite = 0xb7b70 > > and the timestamp given is Mon Apr 17 11:24:51 2017, which looks OK. So for > some reason the timedlock doesn't time out when trying to acquire the > proc_mutex a second time. > > truss (Solaris variant of strace) shows for the first successful call: > > lwp_mutex_timedlock(0xFF1D0000, 0xFFBFF880) = 0 > mutex type: USYNC_PROCESS|LOCK_PRIO_INHERIT|LOCK_ROBUST > timeout: 0.000000000 sec > > and then for the hanging subsequent call by the same thread: > > lwp_mutex_timedlock(0xFF1D0000, 0xFFBFF880) Err#45 EDEADLK > mutex type: USYNC_PROCESS|LOCK_PRIO_INHERIT|LOCK_ROBUST > timeout: 0.000000000 sec > lwp_mutex_timedlock(0xFF1B5778, 0x00000000) (sleeping...) > mutex type: USYNC_THREAD > > Assuming the Solaris code is similar to: > > https://github.com/OpenIndiana/illumos-gate/blame/master/usr/src/lib/libc/port/threads/synch.c > > maybe my Solaris 10 does not have the following code change in it: > > https://github.com/OpenIndiana/illumos-gate/commit/f52756fb59521fc0f684db03ee24da2a1d12a52a > > "6738798 pthread_mutex_timedlock can block forever when using priority > inherit mutexes" > > When running the same testall binary on Solaris 11, I do not get these > hangs. > > Unfortunately I was not (yet) able to find out, whether there's a patch for > Bug 6738798 available on Solaris 10, or whether we would break Solaris 10. Maybe we need to "#define APR_USE_PROC_PTHREAD_MUTEX_COND 0" for Solaris 10, and fall back to the generic implementation (spinning sleep)... Regards, Yann.