Isn't the call to find the best balancer mutex protected?
> On Aug 31, 2023, at 7:44 AM, jean-frederic clere <jfcl...@gmail.com> wrote:
>
> On 8/30/23 17:33, Rainer Jung wrote:
>> Hi JFC,
>> I have not checked ur current code, but the topic reminds me of our history
>> in mod_jk land. There we switched the counters to atomics were available.
>> The other problematic part could be how to handle process local counters
>> versus global counters.
>
> Using apr_atomic_inc32()/apr_atomic_dec32 on apr_size_t busy won't work?
> Actual apr_size_t for busy is probably overkill does using apr_atomic_add64()
> and apr_atomic_dec64() makes sense here?
>
> Anyway I will give it a try.
>
>> Busyness was especially problematic for mod_jk as well, because we never
>> deremented below zero if we lost increments, but if we lost decrements the
>> counters stayed elevated. I think there we now have no longer such problems.
>> Best regards,
>> Rainer
>> Am 30.08.23 um 17:19 schrieb jean-frederic clere:
>>> Hi,
>>>
>>> All the balancers have thread/process safe issues, but with bybusyness the
>>> effect is worse, basically a worker may stay with a busy count greater than
>>> zero even no request is being processed.
>>>
>>> busy is displayed in the balancer_handler() so users/customers will notice
>>> the value doesn't return to zero...
>>>
>>> If you run a load test the value of busy will increase by time and in all
>>> the workers
>>>
>>> When using bybusyness, having pics in the load and later no much load makes
>>> the lowest busy workers used and the ones with a wrong higher value not
>>> being used.
>>>
>>> In a test with 3 workers, I end with busy:
>>> worker1: 3
>>> worker2: 0
>>> worker3: 2
>>> Doing the load test several time the buys values are increasing in all
>>> workers.
>>>
>>> I am wondering is we could end with something like:
>>> worker1: 1000
>>> worker2: 0
>>> worker3: 1000
>>>
>>> in this case bybusyness will send all the load to worker2 until we reach
>>> 1000 simultaneous request on worker2... Obvious that looks bad.
>>>
>>> How to fix that?
>>> 1 - reset the busy using a watchdog and elected (or transferred+read)
>>> unchanged for some time (using one of timeout we have on workers).
>>> 2 - warn in the docs that bybusyness is not the best choice for
>>> loadbalancing.
>>> 3 - create another balancer that just choose random a worker.
>
> --
> Cheers
>
> Jean-Frederic