Isn't the call to find the best balancer mutex protected?

> On Aug 31, 2023, at 7:44 AM, jean-frederic clere <jfcl...@gmail.com> wrote:
> 
> On 8/30/23 17:33, Rainer Jung wrote:
>> Hi JFC,
>> I have not checked ur current code, but the topic reminds me of our history 
>> in mod_jk land. There we switched the counters to atomics were available. 
>> The other problematic part could be how to handle process local counters 
>> versus global counters.
> 
> Using apr_atomic_inc32()/apr_atomic_dec32 on apr_size_t busy won't work?
> Actual apr_size_t for busy is probably overkill does using apr_atomic_add64() 
> and apr_atomic_dec64() makes sense here?
> 
> Anyway I will give it a try.
> 
>> Busyness was especially problematic for mod_jk as well, because we never 
>> deremented below zero if we lost increments, but if we lost decrements the 
>> counters stayed elevated. I think there we now have no longer such problems.
>> Best regards,
>> Rainer
>> Am 30.08.23 um 17:19 schrieb jean-frederic clere:
>>> Hi,
>>> 
>>> All the balancers have thread/process safe issues, but with bybusyness the 
>>> effect is worse, basically a worker may stay with a busy count greater than 
>>> zero even no request is being processed.
>>> 
>>> busy is displayed in the balancer_handler() so users/customers will notice 
>>> the value doesn't return to zero...
>>> 
>>> If you run a load test the value of busy will increase by time and in all 
>>> the workers
>>> 
>>> When using bybusyness, having pics in the load and later no much load makes 
>>> the lowest busy workers used and the ones with a wrong higher value not 
>>> being used.
>>> 
>>> In a test with 3 workers, I end with busy:
>>> worker1: 3
>>> worker2: 0
>>> worker3: 2
>>> Doing the load test several time the buys values are increasing in all 
>>> workers.
>>> 
>>> I am wondering is we could end with something like:
>>> worker1: 1000
>>> worker2: 0
>>> worker3: 1000
>>> 
>>> in this case bybusyness will send all the load to worker2 until we reach 
>>> 1000 simultaneous request on worker2... Obvious that looks bad.
>>> 
>>> How to fix that?
>>> 1 - reset the busy using a watchdog and elected (or transferred+read) 
>>> unchanged for some time (using one of timeout we have on workers).
>>> 2 - warn in the docs that bybusyness is not the best choice for 
>>> loadbalancing.
>>> 3 - create another balancer that just choose random a worker.
> 
> -- 
> Cheers
> 
> Jean-Frederic

Reply via email to