Hi Willy,

I had time to do some quick tests:
- balance source does indeed stop the issue occurring
- building HAProxy from latest 3.0-stable (reports 3.0.5-cdb7dac
2024/10/24) stops the issue occurring with roundrobin (I saw a few 5xx over
a few seconds but I think that might be nothing to do with HAProxy)

The reason we use roundrobin (actually static-rr) outside of this test
environment is our setup looks more like the following as we like to locate
one HAProxy close to each server and have it fallback to some other servers
which have higher latency
---------------------------------
defaults http_defaults_1
  mode http
  hash-type consistent
  hash-balance-factor 150
  maxconn 65536
  http-reuse safe
  option abortonclose
  option allbackups
  timeout check 2s
  timeout connect 15s
  timeout client 30s
  timeout server 30s
...
use_backend server-nearby-backend    if is-server {
nbsrv(server-nearby-backend) ge 1 }
use_backend server-fallback-backend  if is-server
...
backend server-nearby-backend from http_defaults_1
  balance static-rr
  option tcp-smart-connect
  option redispatch 1
  retries 2
  server server-001 fd8a...0001:8080 maxconn 32 check fall 2 inter 4029
backup alpn h2
  server server-002 fd8a...0002:8080 maxconn 32 check fall 2 inter 4029
backup alpn h2
  server server-003 fd8a...0003:8080 maxconn 32 check fall 2 inter 279
downinter 4029 weight 100 alpn h2
...
backend server-fallback-backend from http_defaults_1
  balance static-rr
  option tcp-smart-connect
  option redispatch 1
  retries 2
  server server-004 fd8a...0004:8080 maxconn 64 check fall 2 inter 4029
weight 50 alpn h2
  server server-005 fd8a...0005:8080 maxconn 64 check fall 2 inter 4029
weight 50 alpn h2
  server server-006 fd8a...0006:8080 maxconn 64 check fall 2 inter 4029
weight 50 alpn h2
  server server-007 fd8a...0007:8080 maxconn 64 check fall 2 inter 4029
weight 50 alpn h2
  server server-008 fd8a...0008:8080 maxconn 64 check fall 2 inter 4029
weight 50 alpn h2
---------------------
I think we could consider using balance source for server-nearby-backend if
necessary since there will only be one active server and we have option
allbackups to distribute traffic evenly to the 2 backups there.
But I think upgrading to 3.0.6 will probably be the better solution for us.
Thanks for your help, it sounds like you won't need to investigate this as
it is fixed, otherwise let me know if I can be of assistance with any
diagnostics.

Cheers,
Miles

On Tue, 5 Nov 2024 at 21:56, Willy Tarreau <w...@1wt.eu> wrote:

> Hi Miles,
>
> On Tue, Nov 05, 2024 at 06:54:08PM +1100, Miles Hampson wrote:
> > Hi,
> >
> > I've encountered a situation where HAProxy does not fail over from a
> server
> > it has marked as DOWN to a backup server it has marked as UP. I have
> > managed to reproduce this consistently in a test environment, here is the
> > (I hope) relevant configuration
> >
> > defaults http_defaults_1
> >   mode http
> >   hash-type consistent
> >   hash-balance-factor 150
> >   maxconn 4096
> >   http-reuse safe
> >   option abortonclose
> >   option allbackups
> >   timeout check 2s
> >   timeout connect 15s
> >   timeout client 30s
> >   timeout server 30s
> >
> > backend server-backend from http_defaults_1
> >   balance roundrobin
> >   # We keep at the default because retrying anything else risks
> duplicating
> > events on these servers
> >   retry-on conn-failure
> >   server server-006 fd8a...0006:8080 maxconn 32 check inter 250 alpn h2
> >   server server-012 fd8a...0012:8080 maxconn 32 check backup alpn h2
> >
> > This is with HAproxy 3.0.3-95a607c running on a VPS with 16GB RAM (we
> have
> > seen the same issue on a dedicated server with 64GB though), which is
> > running Ubuntu 24.04.1 with the default net.ipv4.tcp_retries2 = 15,
> > net.ipv4.tcp_syn_retries = 6, and tcp_fin_timeout of 60s (these also
> apply
> > to IPv6 connections). CPU usage is under 20%.
> >
> > Once I have a small load running (20 req/sec), if I make the 8080 port on
> > server-006 temporarily unavailable by restarting the service on it,
> HAProxy
> > logs the transition of server-006 to DOWN (and the stats socket and
> > server_check_failure metrics show the same) and server-012 picks up
> > requests as expected, with no 5xx errors recorded.
> > However if I instead kill the server-006 machine (so that a TCP health
> > check to it with `nc` fails with a timeout rather than a connection
> > refused), the server is marked as DOWN as before, but all requests coming
> > in the HAProxy for that backend return a 5xx error to the client after
> 15s
> > (the timeout connect) and server-012 does not receive any requests
> despite
> > showing as UP in the stats socket. This "not failed over" state of 100%
> 5xx
> > errors goes on for minutes, sometimes hours, and how long seems to depend
> > on the load. Reducing the load to a few requests a minute avoids the
> issue
> > (and dropping the load when it is in the "not failed over" state also
> fixes
> > the issue). I would have expected the <=32 in flight requests to have
> been
> > redispatched to 012 as soon as 006 was marked down, and the other
> <=4096-32
> > requests to have been held in the frontend queue until the backend ones
> > were finished, but understandably things get more complicated when you
> > consider timeouts.
>
> This reminds me a bug related to the queues handling, regarding the fact
> that if there were already requests in queue on a server, subsequent
> requests would directly go into the queue as well regardless of the
> server's state, and be picked up by that server once finishing processing
> a previous request. I would appreciate it if you could recheck with an up
> to date version to be sure we're not chasing an already fixed issue.
>
> Also, if you have only one server of each type here (one active and one
> backup), determinists algorithms such as "balance source" should normally
> not exhibit this behavior. I'm not suggesting this as a solution but as a
> temporary workaround, of course.
>
> If you're building from sources, you can even try the latest 3.0-stable
> (about to be released as 3.0.6 soon).
>
> Thanks,
> Willy
>

Reply via email to