Le 19/05/2020 à 15:36, Jarno Huuskonen a écrit :

I think I found a case when haproxy-2.0.14 with htx and retry-on
all-retryable-errors sometimes seems to select wrong backend/server
to retry. (Doesn't happen on every retry).

I found that sometimes when our wordpress backend gave 500 error
would retry on wrong backend.  Here's a very simplified redacted
(I can provide full config off list(if needed)).

wordpress frontend has http/2 enabled:
frontend FE_wp
      bind ipv4@address-here:443 name wpv4a1 ssl
crt/etc/haproxy/ssl/crt1.pem alpn h2,http/1.1 crt
/etc/haproxy/ssl/crt2.pem alpn h2,http/1.1 ssl-min-ver TLSv1.2
default_backend BE_wp_blogs

# This is the backend that sometimes gave 500 errors
backend BE_wp_blogs2
     retries     2
     option      redispatch
     option      prefer-last-server
     retry-on    all-retryable-errors
     balance     roundrobin

     timeout connect     4500ms
     timeout server      40s
     timeout queue       4s
     timeout check       5s

     cookie cookiename insert indirect nocache httponly maxidle 20m
     default-server inter 15s downinter 25s rise 2 error-limit 250 on-
error fail-check
     server name1 ip1:8443 id 1 cookie name1 check observe layer7
     server name2 ip2:8443 id 2 cookie name2 check observe layer7
     server name3 ip3:8443 id 3 cookie name3 check observe layer7

# These are the unrelated/wrong backend that
backend wrong1
     server diff1 different_ip:2048 id 1 cookie diff1 maxconn 500 track
     server diff2 different_ip:2048 id 2 cookie diff2 maxconn 500 track

backend wrong2
     server diffssl1 different_ip:2443 id 1 cookie diff1 maxconn 500
track BE_other/ezauth1
     server diffssl2 different_ip:2443 id 2 cookie diff2 maxconn 500
track BE_other/ezauth2

All backends had servers with same numeric id's id 1-3 for wordpress
"wrong" backend servers with id's 1-2.  I tried changing all backends
server id's but I still sometimes get wrong backend/server.

If I set no option http-use-htx or retry-on conn-failure then
AFAIK(limited testing) the problem doesn't happen.

I haven't managed to reproduce with simple php script that just gives
500 error, so there could be some timing that triggers this.

Example haproxy log when request goes to wrong backend/server:
haproxy[258199]: client-ipv6-address:42630 [19/May/2020:16:09:29.366]
FE_wp~ BE_wp_blogs2/name3 0/0/89/0/89 400 134 - - --VU 1/1/0/121/2 0/0
{hostheader} "GET /wp-admin/ HTTP/2.0" 443 HTTP/2

(And the wrong backend/server (tomcat in this case) logs this:
haproxy-ip-address - - [19/May/2020:16:09:29 +0300] "-" 400 - 0ms
JSESSIONID=-) (400 error because haproxy sends http to tomcat https

tshark -e text shows this for the wordpress backend 500 response that
can trigger retry on wrong backend:
       "layers": {
         "http.response.code": [
         "http.response.phrase": [
           "Internal Server Error"
         "text": [
           "HTTP/1.1 500 Internal Server Error\\r\\n",
           "HTTP chunked response",
           "Data chunk (2892 octets)",
           "End of chunked encoding",
(I've omitted the response body).

Any more tests etc. I could try to figure out what's going on ? Perhaps
try with latest 2.2-dev ?


It was already reported on github and seems to be fixed. We are just waiting a feedback to be sure it is fixed before backporting the patch. See https://github.com/haproxy/haproxy/issues/623.

If you try the latest 2.2 snapshot, it should be good. You may also try to cherry-pick the commit 8cabc9783 to the 2.0.

Christopher Faulet

Reply via email to