Hello Yann,

Am 18.04.2017 um 23:37 schrieb Yann Ylavic:
On Tue, Apr 18, 2017 at 4:25 PM, Rainer Jung <rainer.j...@kippdata.de> wrote:
Am 18.04.2017 um 00:16 schrieb Yann Ylavic:

All the above hopefuly fixes with r1791718 and r1791728, both
backported to 1.6.x.

Sorry to say that, but there's now another strange test result on Solaris 8.
Again testprocmutex. It fails or mech "default_timed" in test_exclusive
here:

   209          rv = apr_proc_mutex_unlock(proc_lock);
   210          APR_ASSERT_SUCCESS(tc, "unlock after timedlock check", rv);
   211
   212          *x = 0;
   213
   214          for (n = 0; n < CHILDREN; n++)
   215              make_child(tc, -1, &child[n], p);
   216
   217          for (n = 0; n < CHILDREN; n++)
   218              await_child(tc, child[n]);
                    ^^^^^^^^^^^

In await_child while waiting for the number 0 child, apr_proc_wait returns
after 75 seconds with code 1.

When I try t use truss, I had to terminate it after a few minutes, because
it wasn't terminating by itself. I had about half 550.000 lwp_mutex_lock and
590.000 lwp_mutex_unlock for each of the the 6 threads,

Do you really mean more unlocks than locks?
I don't see how locks/unlocks could be unbalanced in
proc_mutex_pthread_acquire_ex() and proc_mutex_pthread_release().

For each the testprocmutex's child process loop, there should be a max
of 2 locks+unlocks per proc_mutex_pthread_acquire_ex() (given that
pthread_cond_timedwait() itself releases and re-acquires the mutex if
it's called), and 1 lock+unlock per proc_mutex_pthread_release().
That's something like 3 x CHILDREN x MAX_ITER = 3600 locks and as much
unlocks for the whole "default_timed" test, if no timeout occurs!
According to your numbers, that'd mean ~150 timeouts/retries on a
total of 1200 loops (~12.5%).

and also about
40.000 per thread of:

lwp_cond_wait(0xFF040018, 0xFF040000, 0xFFBEF868) Err#62 ETIME

So < 10% of the 2400 proc_mutex_pthread_acquire_ex()'s locks+unlocks
come from pthread_cond_timedwait(), which means most of the
apr_proc_mutex_timedlock() end up being simple {pthread_mutex_lock();
locked=1; pthread_mutex_unlock()}, and sounds to me like a costly
mutex implementation on Solaris 8...


Any idea what to look for in the condvar impl of the timedlock feature?

Since I wrote it I may be missing something, but I see nothing wrong yet...
We could optimize it for the non contended case by using atomics on
the shared proc_pthread_mutex_t->cond_locked state still.

How is the following patch working, by waiting for a more or less
longer time on the mutex in the test (here 10ms)?

Index: test/testprocmutex.c
===================================================================
--- test/testprocmutex.c    (revision 1791728)
+++ test/testprocmutex.c    (working copy)
@@ -87,11 +87,12 @@ static void make_child(abts_case *tc, int trylock,
             }
             else if (trylock < 0) {
                 int wait_usec = 0;
+                apr_interval_time_t timeout = apr_time_from_msec(10);

-                while ((rv = apr_proc_mutex_timedlock(proc_lock, 1))) {
+                while ((rv = apr_proc_mutex_timedlock(proc_lock, timeout))) {
                     if (!APR_STATUS_IS_TIMEUP(rv))
                         exit(1);
-                    if (++wait_usec >= MAX_WAIT_USEC)
+                    if (++wait_usec >= MAX_WAIT_USEC / timeout)
                         exit(1);
                 }
             }
_


Thanks for pointing me at that code. I tried a variation of timeouts. The test on Solaris 10 nearly always fails here *after some time*.

The wait_usec loop tying the apr_proc_mutex_timedlock() very often succeeds already during the first or second, sometimes the third attempt. But then the outer loop "while (i < MAX_ITER)" suddenly starts to fail because the inner loop switches behavior and apr_proc_mutex_timedlock() always gets the timeout until we stop trying due to wait_usec. Increasing MAX_WAIT_USEC from 1 sec to 5 sec didn't help. I have no idea, what the root cause for this behavior switching is.

I also tried Solaris specific pthread_cond_reltimedwait_np() instead of pthread_cond_timedwait() to rule out some time calculation problems and deltas between clocks APR might use and the CLOCK that the system impl uses. That didn't help though.

It also does not seem to be a problem with spurious wakeups or early return from the timed waiting.

Regards,

Rainer

Reply via email to