Hello,

While renewing a node.js servers and a galera cluster (mariadb) I'm seeing an unexpected behaviour on TCP connections between node.js application and mariadb.
There is a lot of connections resets during transfers on backend side.

My previous (working) setup was based on Debian 10, mariadb 10.5, node.js 16 (and some dependencies) and haproxy 2.6. I had a server running several node.js processes and a 3-node galera mariadb cluster. To provide some HA, I configured haproxy as a TCP proxy for mariadb connections.
The usual setup is :
node.js -> haproxy -> mariadb
node.js application uses a connection pool to maintain several open connections to database server that may be idle for a long time. The timeouts are adjusted in haproxy to avoid disconnecting idle connections.
This setup worked just fine on old servers.

Then I've setup new servers on Debian 11: a new mariadb galera cluster (10.6), a new node.js application server (no real changes in node.js software versions there) and haproxy (2.6.6 currently). The global setup of all of this is quite the same as before but not exactly the same. I tried however to be as close as possible to the old setup. Now, once I started the node.js application, the database connections are established and after about 20 minutes I start to see application warnings about lost connections to database. On haproxy stats page I can see lot of 'connections resets during tranfers' backend side. On database side I can see idle processes that stay there even if I close node.js application or restart haproxy. These have to timeout or be killed to disappear. As if there was no communication any more between haproxy and mariadb (on these tcp connections). At the same moment other database connections are established or continue to function. Maybe something related to idle connections ?

If it may help : all these servers are VMs in OVH public cloud and communications between servers are established through a private vlan in the same datacenter.

If I remove haproxy from workflow (node.js -> mariadb) I cannot see any error anymore. But I don't understand why it worked fine before and is working this way right now...
Any help is welcome.

My current haproxy setup :

global
  log /dev/log  local0
  log /dev/log  local1 notice
  chroot /var/lib/haproxy
  stats socket /run/haproxy/admin.sock mode 660 level admin
  stats timeout 30s
  user haproxy
  group haproxy
  daemon

  # Default SSL material locations
  ca-base /etc/ssl/certs
  crt-base /etc/ssl/private

  # See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate   ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384   ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
  ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

  ssl-dh-param-file /etc/haproxy/ssl/dhparams.pem
  tune.ssl.default-dh-param 2048

  maxconn 50000

  #nosplice

defaults
  log global
  option dontlognull
  option dontlog-normal
  timeout connect 5000
  timeout client  50000
  timeout server  50000

  #option tcpka

  errorfile 400 /etc/haproxy/errors/400.http
  errorfile 403 /etc/haproxy/errors/403.http
  errorfile 408 /etc/haproxy/errors/408.http
  errorfile 500 /etc/haproxy/errors/500.http
  errorfile 502 /etc/haproxy/errors/502.http
  errorfile 503 /etc/haproxy/errors/503.http
  errorfile 504 /etc/haproxy/errors/504.http

  option splice-auto
  option splice-request
  option splice-response

frontend db3_front
  bind 127.0.1.1:3306
  mode tcp
  # haproxy client connection timeout is 1 second longer than the default mariadb wait_timeout which is 28800 seconds
  # this avoids haproxy to close an idle connection with no reason
  timeout client 28801s
  maxconn 10000
  no log
  default_backend db3_back

backend db3_back
  mode tcp
  # haproxy server connection timeout is 1 second longer than the default mariadb wait_timeout which is 28800 seconds
  # this avoids haproxy to close an idle connection with no reason
  timeout server 28801s
  option mysql-check user hacheck post-41
  fullconn 10000
  timeout check 10s
  server db3sbg5 10.140.154.94:3306 maxconn 10000 check on-marked-down shutdown-sessions   server db3de1  10.140.3.131:3306  maxconn 10000 backup check on-marked-down shutdown-sessions   server db3gra5 10.140.103.12:3306 maxconn 10000 backup check on-marked-down shutdown-sessions

[similar redis proxy config removed]

listen stats
  bind *:443 interface ens3 ssl crt /etc/haproxy/ssl/server.pem alpn h2,http/1.1
  mode http
  no log
  maxconn 100
  stats enable
  stats uri /...
  stats refresh 5s
  stats show-legends
  stats show-node
  stats admin if TRUE
  ...

I tried some modifications in haproxy config (nosplice or tcpka) but errors are still there. I also tried previous haproxy versions (2.6.5, 2.6.4) but it doesn't solve the problem.

--
Best regards,
Artur


Reply via email to