TCP connections resets during backend transfers

Artur Tue, 18 Oct 2022 02:16:39 -0700

Hello,

While renewing a node.js servers and a galera cluster (mariadb) I'mseeing an unexpected behaviour on TCP connections between node.jsapplication and mariadb.

There is a lot of connections resets during transfers on backend side.

My previous (working) setup was based on Debian 10, mariadb 10.5,node.js 16 (and some dependencies) and haproxy 2.6.I had a server running several node.js processes and a 3-node galeramariadb cluster.To provide some HA, I configured haproxy as a TCP proxy for mariadbconnections.

The usual setup is :
node.js -> haproxy -> mariadb

node.js application uses a connection pool to maintain several openconnections to database server that may be idle for a long time.The timeouts are adjusted in haproxy to avoid disconnecting idleconnections.

This setup worked just fine on old servers.

Then I've setup new servers on Debian 11: a new mariadb galera cluster(10.6), a new node.js application server (no real changes in node.jssoftware versions there) and haproxy (2.6.6 currently).The global setup of all of this is quite the same as before but notexactly the same. I tried however to be as close as possible to the oldsetup.Now, once I started the node.js application, the database connectionsare established and after about 20 minutes I start to see applicationwarnings about lost connections to database.On haproxy stats page I can see lot of 'connections resets duringtranfers' backend side.On database side I can see idle processes that stay there even if Iclose node.js application or restart haproxy. These have to timeout orbe killed to disappear. As if there was no communication any morebetween haproxy and mariadb (on these tcp connections).At the same moment other database connections are established orcontinue to function. Maybe something related to idle connections ?

If it may help : all these servers are VMs in OVH public cloud andcommunications between servers are established through a private vlan inthe same datacenter.

If I remove haproxy from workflow (node.js -> mariadb) I cannot see anyerror anymore. But I don't understand why it worked fine before and isworking this way right now...

Any help is welcome.

My current haproxy setup :

global
  log /dev/log  local0
  log /dev/log  local1 notice
  chroot /var/lib/haproxy
  stats socket /run/haproxy/admin.sock mode 660 level admin
  stats timeout 30s
  user haproxy
  group haproxy
  daemon

  # Default SSL material locations
  ca-base /etc/ssl/certs
  crt-base /etc/ssl/private

# See:https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate ssl-default-bind-ciphersECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384 ssl-default-bind-ciphersuitesTLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256

  ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

  ssl-dh-param-file /etc/haproxy/ssl/dhparams.pem
  tune.ssl.default-dh-param 2048

  maxconn 50000

  #nosplice

defaults
  log global
  option dontlognull
  option dontlog-normal
  timeout connect 5000
  timeout client  50000
  timeout server  50000

  #option tcpka

  errorfile 400 /etc/haproxy/errors/400.http
  errorfile 403 /etc/haproxy/errors/403.http
  errorfile 408 /etc/haproxy/errors/408.http
  errorfile 500 /etc/haproxy/errors/500.http
  errorfile 502 /etc/haproxy/errors/502.http
  errorfile 503 /etc/haproxy/errors/503.http
  errorfile 504 /etc/haproxy/errors/504.http

  option splice-auto
  option splice-request
  option splice-response

frontend db3_front
  bind 127.0.1.1:3306
  mode tcp

# haproxy client connection timeout is 1 second longer than thedefault mariadb wait_timeout which is 28800 seconds

  # this avoids haproxy to close an idle connection with no reason
  timeout client 28801s
  maxconn 10000
  no log
  default_backend db3_back

backend db3_back
  mode tcp

# haproxy server connection timeout is 1 second longer than thedefault mariadb wait_timeout which is 28800 seconds

  # this avoids haproxy to close an idle connection with no reason
  timeout server 28801s
  option mysql-check user hacheck post-41
  fullconn 10000
  timeout check 10s

server db3sbg5 10.140.154.94:3306 maxconn 10000 check on-marked-downshutdown-sessions server db3de1 10.140.3.131:3306 maxconn 10000 backup checkon-marked-down shutdown-sessions server db3gra5 10.140.103.12:3306 maxconn 10000 backup checkon-marked-down shutdown-sessions


[similar redis proxy config removed]

listen stats

bind *:443 interface ens3 ssl crt /etc/haproxy/ssl/server.pem alpnh2,http/1.1

  mode http
  no log
  maxconn 100
  stats enable
  stats uri /...
  stats refresh 5s
  stats show-legends
  stats show-node
  stats admin if TRUE
  ...

I tried some modifications in haproxy config (nosplice or tcpka) buterrors are still there.I also tried previous haproxy versions (2.6.5, 2.6.4) but it doesn'tsolve the problem.


--
Best regards,
Artur

TCP connections resets during backend transfers

Reply via email to