Hi Damien,

On Tue, Apr 09, 2024 at 03:37:07PM +0000, Damien Claisse wrote:
> We observed that a dynamic server which health check is down for longer
> than slowstart delay at startup doesn't trigger the warmup phase, it
> receives full traffic immediately. This has been confirmed by checking
> haproxy UI, weight is immediately the full one (e.g. 75/75), without any
> throttle applied. Further tests showed that it was similar if it was in
> maintenance, and even when entering a down or maintenance state after
> being up.
> Another issue is that if the server is down for less time than
> slowstart, when it comes back up, it briefly has a much higher weight
> than expected for a slowstart.
> 
> An easy way to reproduce is to do the following:
> - Add a server with e.g. a 20s slowstart and a weight of 10 in config
>   file
> - Put it in maintenance using CLI (set server be1/srv1 state maint)
> - Wait more than 20s, enable it again (set server be1/srv1 state ready)
> - Observe UI, weight will show 10/10 immediately.
> If server was down for less than 20s, you'd briefly see a weight and
> throttle value that is inconsistent, e.g. 50% throttle value and a
> weight of 5 if server comes back up after 10s before going back to
> 6% after a second or two.
> 
> Code analysis shows that the logic in server_recalc_eweight stops the
> warmup task by setting server's next state to SRV_ST_RUNNING if it
> didn't change state for longer than the slowstart duration, regardless
> of its current state. As a consequence, a server being down or disabled
> for longer than the slowstart duration will never enter the warmup phase
> when it will be up again.
> 
> Regarding the weight when server comes back up, issue is that even if
> the server is down, we still compute its next weight as if it was up,
> hence when it comes back up, it can briefly have a much higher weight
> than expected during slowstart, until the warmup task is called again
> after last_change is updated.
> 
> This patch aims to fix both issues.
(...)

You analysis makes a lot of sense, and I'm not much surprised that this
has been revealed by dynamic servers, because the startup sequence before
them was very stable and well established. So for sure if certain parts
start in a different order it can have such visible effects. Good catch!

It's now merged, thank you!
Willy

Reply via email to