We observed that a dynamic server which health check is down for longer than slowstart delay at startup doesn't trigger the warmup phase, it receives full traffic immediately. This has been confirmed by checking haproxy UI, weight is immediately the full one (e.g. 75/75), without any throttle applied. Further tests showed that it was similar if it was in maintenance, and even when entering a down or maintenance state after being up. Another issue is that if the server is down for less time than slowstart, when it comes back up, it briefly has a much higher weight than expected for a slowstart.
An easy way to reproduce is to do the following: - Add a server with e.g. a 20s slowstart and a weight of 10 in config file - Put it in maintenance using CLI (set server be1/srv1 state maint) - Wait more than 20s, enable it again (set server be1/srv1 state ready) - Observe UI, weight will show 10/10 immediately. If server was down for less than 20s, you'd briefly see a weight and throttle value that is inconsistent, e.g. 50% throttle value and a weight of 5 if server comes back up after 10s before going back to 6% after a second or two. Code analysis shows that the logic in server_recalc_eweight stops the warmup task by setting server's next state to SRV_ST_RUNNING if it didn't change state for longer than the slowstart duration, regardless of its current state. As a consequence, a server being down or disabled for longer than the slowstart duration will never enter the warmup phase when it will be up again. Regarding the weight when server comes back up, issue is that even if the server is down, we still compute its next weight as if it was up, hence when it comes back up, it can briefly have a much higher weight than expected during slowstart, until the warmup task is called again after last_change is updated. This patch aims to fix both issues. --- src/server.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/server.c b/src/server.c index ea93e1ff4..529fcf169 100644 --- a/src/server.c +++ b/src/server.c @@ -2279,15 +2279,17 @@ void server_recalc_eweight(struct server *sv, int must_update) unsigned w; if (ns_to_sec(now_ns) < sv->last_change || ns_to_sec(now_ns) >= sv->last_change + sv->slowstart) { - /* go to full throttle if the slowstart interval is reached */ - if (sv->next_state == SRV_ST_STARTING) + /* go to full throttle if the slowstart interval is reached unless server is currently down */ + if ((sv->cur_state != SRV_ST_STOPPED) && (sv->next_state == SRV_ST_STARTING)) sv->next_state = SRV_ST_RUNNING; } /* We must take care of not pushing the server to full throttle during slow starts. * It must also start immediately, at least at the minimal step when leaving maintenance. */ - if ((sv->next_state == SRV_ST_STARTING) && (px->lbprm.algo & BE_LB_PROP_DYN)) + if ((sv->cur_state == SRV_ST_STOPPED) && (sv->next_state == SRV_ST_STARTING) && (px->lbprm.algo & BE_LB_PROP_DYN)) + w = 1; + else if ((sv->next_state == SRV_ST_STARTING) && (px->lbprm.algo & BE_LB_PROP_DYN)) w = (px->lbprm.wdiv * (ns_to_sec(now_ns) - sv->last_change) + sv->slowstart) / sv->slowstart; else w = px->lbprm.wdiv; -- 2.34.1