Hello everyone,

I'm currently troubleshooting intermittent 520 errors returned by
Cloudflare in front of an HAProxy-based load balancing infrastructure.

These errors are rare—affecting about 0.001% of requests—which makes them
especially tricky to diagnose. According to Cloudflare, a 520 indicates the
origin server (HAProxy, in this case) unexpectedly closed the connection or
returned an empty response.

Key Observations
- it happens on several frontends and on different haproxy instances.
- The issue only occurs when the client request protocol is HTTP/2 (http/3
is disabled on our side at the moment).
- Disabling Cloudflare's "HTTP/2 to origin" option completely eliminates
the issue.
This strongly suggests the problem lies in the interaction between
Cloudflare's HTTP/2 implementation and HAProxy's HTTP/2 support.

Cloudflare Logs for 520 Events
"OriginSSLProtocol": "unknown",
"OriginTCPHandshakeDurationMs": 11,
"OriginTLSHandshakeDurationMs": 12,
"OriginRequestHeaderSendDurationMs": 0,
"OriginResponseDurationMs": 0,
"ClientSSLCipher": "AEAD-AES128-GCM-SHA256",
"ClientSSLProtocol": "TLSv1.3"

To me, this implies:
- Cloudflare successfully opens a TCP connection to HAProxy
(OriginTCPHandshakeDurationMs is non-zero),
- But fails during the TLS handshake, as evidenced by "OriginSSLProtocol":
"unknown".

HAProxy version 2.8.15 (issue also present on 2.8.11)

Relevant global settings:
    ssl-default-bind-ciphersuites
TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
    ssl-default-bind-client-sigalgs
rsa_pkcs1_sha256:rsa_pkcs1_sha384:rsa_pkcs1_sha512:ecdsa_secp256r1_sha256:ecdsa_secp384r1_sha384:ecdsa_secp521r1_sha512:rsa_pss_rsae_sha256:rsa_pss_rsae_sha384:rsa_pss_rsae_sha512:ed25519:ed448:rsa_pss_pss_sha256:rsa_pss_pss_sha384:rsa_pss_pss_sha512
    option tcp-smart-accept
    option tcp-smart-connect

Frontend configuration (abridged):
    mode http
    bind xxxxx:443 ssl crt-list /folder/redacted.txt ssl-min-ver TLSv1.2
tls-ticket-keys /var/run/tls_ticket_keys


I've tried enabling debug-style logging using:
    option log-separate-errors
    option logasap
    error-log-format '%ci:%cp -> %fi:%fp %b/%s | Tq=%Tq Tw=%Tw Tc=%Tc
Tr=%Tr Ta=%Ta Tt=%Tt TR=%TR HTTP=%ST (%hrl) B=%B | TLS=%sslv/%sslc
SNI=%[ssl_fc_sni] %HM %HP'

Unfortunately, these logs don't show any detail about the failed TLS
handshakes.

Questions
- have anyone experienced this issue ? If you have a small amount of 520 it
may not been printed on the interface, but it is there. Just go the
following URL :
https://dash.cloudflare.com/[accountID]/mydomain.com/analytics/traffic?status-code=520
- Are there any known issues with HAProxy and HTTP/2 in TLS termination
scenarios ?
- Is there any way to increase logging detail specifically for failed TLS
handshakes? tcpdump is a no go in this case due to the heavy trafic and the
minimal occurrence of the issue.

Because the issue is so rare (under 100 requests/day), every test takes
time to yield results. Any insights or ideas would be greatly appreciated.

Thanks in advance!
Olivier

Reply via email to