Hi,

I think I found a case when haproxy-2.0.14 with htx and retry-on
all-retryable-errors sometimes seems to select wrong backend/server
to retry. (Doesn't happen on every retry).

I found that sometimes when our wordpress backend gave 500 error
haproxy
would retry on wrong backend.  Here's a very simplified redacted
config,
(I can provide full config off list(if needed)).

wordpress frontend has http/2 enabled:
frontend FE_wp
     bind ipv4@address-here:443 name wpv4a1 ssl
crt/etc/haproxy/ssl/crt1.pem alpn h2,http/1.1 crt
/etc/haproxy/ssl/crt2.pem alpn h2,http/1.1 ssl-min-ver TLSv1.2
...
use_backend
%[req.hdr(Host),lower,map_dom(/path/to/wordpress_backends.map,BE_wp_blo
gs)]
default_backend BE_wp_blogs


# This is the backend that sometimes gave 500 errors
backend BE_wp_blogs2
...
    retries     2
    option      redispatch
    option      prefer-last-server
    retry-on    all-retryable-errors
    balance     roundrobin

    timeout connect     4500ms
    timeout server      40s
    timeout queue       4s
    timeout check       5s

    cookie cookiename insert indirect nocache httponly maxidle 20m
    default-server inter 15s downinter 25s rise 2 error-limit 250 on-
error fail-check
    server name1 ip1:8443 id 1 cookie name1 check observe layer7
    server name2 ip2:8443 id 2 cookie name2 check observe layer7
    server name3 ip3:8443 id 3 cookie name3 check observe layer7

# These are the unrelated/wrong backend that
backend wrong1
...
    server diff1 different_ip:2048 id 1 cookie diff1 maxconn 500 track
BE_other/ezauth1
    server diff2 different_ip:2048 id 2 cookie diff2 maxconn 500 track
BE_other/ezauth2

backend wrong2
...
    server diffssl1 different_ip:2443 id 1 cookie diff1 maxconn 500
track BE_other/ezauth1
    server diffssl2 different_ip:2443 id 2 cookie diff2 maxconn 500
track BE_other/ezauth2

All backends had servers with same numeric id's id 1-3 for wordpress
and
"wrong" backend servers with id's 1-2.  I tried changing all backends
server id's but I still sometimes get wrong backend/server.

If I set no option http-use-htx or retry-on conn-failure then
AFAIK(limited testing) the problem doesn't happen.

I haven't managed to reproduce with simple php script that just gives
500 error, so there could be some timing that triggers this.

Example haproxy log when request goes to wrong backend/server:
haproxy[258199]: client-ipv6-address:42630 [19/May/2020:16:09:29.366]
FE_wp~ BE_wp_blogs2/name3 0/0/89/0/89 400 134 - - --VU 1/1/0/121/2 0/0
{hostheader} "GET /wp-admin/ HTTP/2.0" 443 HTTP/2

(And the wrong backend/server (tomcat in this case) logs this:
haproxy-ip-address - - [19/May/2020:16:09:29 +0300] "-" 400 - 0ms
JSESSIONID=-) (400 error because haproxy sends http to tomcat https
port).

tshark -e text shows this for the wordpress backend 500 response that
can trigger retry on wrong backend:
      "layers": {
        "http.response.code": [
          "500"
        ],
        "http.response.phrase": [
          "Internal Server Error"
        ],
        "text": [
          "Timestamps",
          "HTTP/1.1 500 Internal Server Error\\r\\n",
          "\\r\\n",
          "HTTP chunked response",
          "Data chunk (2892 octets)",
          "End of chunked encoding",
          "\\r\\n",
(I've omitted the response body).

Any more tests etc. I could try to figure out what's going on ? Perhaps
try with latest 2.2-dev ?

-Jarno

(haproxy -vv:
HA-Proxy version 2.0.14 2020/04/02 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -Wno
-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-
declaration -
Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers
-Wtype-limit
s
  OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_REGPARM=1 USE_GETADDRINFO=1
USE_OPENSSL=
1 USE_ZLIB=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER +PCRE
+PCRE_JIT -PCRE2 -PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD
-PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY
+LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO
+OPENSSL -LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO
+NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER
+PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=4).
Built with OpenSSL version : OpenSSL 1.1.1d  10 Sep 2019
Running on OpenSSL version : OpenSSL 1.1.1d  10 Sep 2019
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT
IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.7
Running on zlib version : 1.2.7
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.32 2012-11-30
Running on PCRE version : 8.32 2012-11-30
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto'
keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services : none

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace
)

-- 
Jarno Huuskonen

Reply via email to