Hello Aaron,

On Tue, Sep 03, 2024 at 04:24:57PM -0400, Aaron Kuehler wrote:
> Allow the user to set the "initial state" of a server.

Thank you very much for working on this one!

> Context:
> 
> Servers are always set in an UP status by default. In
> some cases, further checks are required to determine if the server is
> ready to receive client traffic.
> 
> This introduces the "init-state {up|down}" configuration parameter to
> the server.
> 
> - when not set, the server is considered available as soon as a connection
>   can be established.
> - when set to 'up', the server is considered available as soon as a
>   connection can be established.
> - when set to 'down', the server is initially considered unavailable until
>   it successfully completes its health checks.

I'm having a difficulty understanding the first two cases, and it seems
from the patch that they are in fact the same. I think that "as soon as
a connection can be established" is confusing as it makes one think that
it still needs to perform one check (otherwise it's not clear what this
"connection" refers to).

Thus I suggest that the first two lines are merged into one with an
explanation like this one:

  - when set to 'up' (the default), the server is considered immediately
    available and will initiate a health check that can turn it to the DOWN
    state immediately if it fails.

And the same could be done in the doc, which contains the same text.

> The server's init-state is considered when the HAProxy instance
> is (re)started, a new server is detected (for example via service
> discovery / DNS resolution), a server exits maintenance, etc.

You're right, it's important to mention this because it's far from being
obvious.

By analogy with the "up" state, the "down" should be at check_rise-1 
instead of zero I think, so that it's sufficient to succeed one check
to turn it on. Otherwise it can take a lot of time to introduce some
servers which are slowly checked.

I'm just thinking about something, since I've read that complaint as well
a few times: in some environments, some servers are having difficulties
responding positively to initial health checks, and due to the way we're
starting at the check_rise value, the first failure is sufficient to turn
it down. A few users already asked for a way to turn the server completely
up. I think there could be value in having 3 different init states:
  - 'down': needs to validate one check before turning up
  - 'probe': the current situation, where it's up until the first test
    makes it fail
  - 'up': it starts fully up (check_rise+check_fall-1 IIRC).

Or maybe even 4 states:
  - "fully-down": 0, requires that all checks are valid before it turns up
  - "down": check_rise-1, requires one successful check to turn up
  - "up": check_rise, requires one faulty check to turn down
  - "fully-up": check_rise+check_fall-1, requires that all checks fail to
    turn down.

And we'd default to "up" like in your patch.

What do you think ?

BTW your patch is super clean, I've only found one tiny thing (just to
say that I found something):

> @@ -6504,7 +6526,11 @@ static int _srv_update_status_adm(struct server *s, 
> enum srv_adm_st_chg_cause ca
>                */
>               if (s->check.state & CHK_ST_ENABLED) {
>                       s->check.state &= ~CHK_ST_PAUSED;
> -                     s->check.health = s->check.rise; /* start OK but check 
> immediately */
> +                     if(s->init_state == SRV_INIT_STATE_DOWN) {
                         ^^
missing space here. :-)

> +                             s->check.health = 0; /* checks must pass before 
> the server is considered UP */
> +                     } else {
> +                             s->check.health = s->check.rise; /* start OK 
> but check immediately */
> +                     }
>               }

Thanks!
Willy


Reply via email to