We observed that a dynamic server which health check is down for longer
than slowstart delay at startup doesn't trigger the warmup phase, it
receives full traffic immediately. This has been confirmed by checking
haproxy UI, weight is immediately the full one (e.g. 75/75), without any
throttle applied. Further tests showed that it was similar if it was in
maintenance, and even when entering a down or maintenance state after
being up.
Another issue is that if the server is down for less time than
slowstart, when it comes back up, it briefly has a much higher weight
than expected for a slowstart.

An easy way to reproduce is to do the following:
- Add a server with e.g. a 20s slowstart and a weight of 10 in config
  file
- Put it in maintenance using CLI (set server be1/srv1 state maint)
- Wait more than 20s, enable it again (set server be1/srv1 state ready)
- Observe UI, weight will show 10/10 immediately.
If server was down for less than 20s, you'd briefly see a weight and
throttle value that is inconsistent, e.g. 50% throttle value and a
weight of 5 if server comes back up after 10s before going back to
6% after a second or two.

Code analysis shows that the logic in server_recalc_eweight stops the
warmup task by setting server's next state to SRV_ST_RUNNING if it
didn't change state for longer than the slowstart duration, regardless
of its current state. As a consequence, a server being down or disabled
for longer than the slowstart duration will never enter the warmup phase
when it will be up again.

Regarding the weight when server comes back up, issue is that even if
the server is down, we still compute its next weight as if it was up,
hence when it comes back up, it can briefly have a much higher weight
than expected during slowstart, until the warmup task is called again
after last_change is updated.

This patch aims to fix both issues.
---
 src/server.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/src/server.c b/src/server.c
index ea93e1ff4..529fcf169 100644
--- a/src/server.c
+++ b/src/server.c
@@ -2279,15 +2279,17 @@ void server_recalc_eweight(struct server *sv, int 
must_update)
        unsigned w;
 
        if (ns_to_sec(now_ns) < sv->last_change || ns_to_sec(now_ns) >= 
sv->last_change + sv->slowstart) {
-               /* go to full throttle if the slowstart interval is reached */
-               if (sv->next_state == SRV_ST_STARTING)
+               /* go to full throttle if the slowstart interval is reached 
unless server is currently down */
+               if ((sv->cur_state != SRV_ST_STOPPED) && (sv->next_state == 
SRV_ST_STARTING))
                        sv->next_state = SRV_ST_RUNNING;
        }
 
        /* We must take care of not pushing the server to full throttle during 
slow starts.
         * It must also start immediately, at least at the minimal step when 
leaving maintenance.
         */
-       if ((sv->next_state == SRV_ST_STARTING) && (px->lbprm.algo & 
BE_LB_PROP_DYN))
+       if ((sv->cur_state == SRV_ST_STOPPED) && (sv->next_state == 
SRV_ST_STARTING) && (px->lbprm.algo & BE_LB_PROP_DYN))
+               w = 1;
+       else if ((sv->next_state == SRV_ST_STARTING) && (px->lbprm.algo & 
BE_LB_PROP_DYN))
                w = (px->lbprm.wdiv * (ns_to_sec(now_ns) - sv->last_change) + 
sv->slowstart) / sv->slowstart;
        else
                w = px->lbprm.wdiv;
-- 
2.34.1

Reply via email to