Hello,
OK, these lost connections during transfers servers side was related to
some firewall or hardware timing out long-lived tcp connections.
To solve this problem I added 'option tcpka' in defaults haproxy section.
Moreover, one can also adjust the following kernel variables :
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
In my situation I had to change these to something like :
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 120
Le 18/10/2022 à 11:15, Artur wrote :
Hello,
While renewing a node.js servers and a galera cluster (mariadb) I'm
seeing an unexpected behaviour on TCP connections between node.js
application and mariadb.
There is a lot of connections resets during transfers on backend side.
My previous (working) setup was based on Debian 10, mariadb 10.5,
node.js 16 (and some dependencies) and haproxy 2.6.
I had a server running several node.js processes and a 3-node galera
mariadb cluster.
To provide some HA, I configured haproxy as a TCP proxy for mariadb
connections.
The usual setup is :
node.js -> haproxy -> mariadb
node.js application uses a connection pool to maintain several open
connections to database server that may be idle for a long time.
The timeouts are adjusted in haproxy to avoid disconnecting idle
connections.
This setup worked just fine on old servers.
Then I've setup new servers on Debian 11: a new mariadb galera cluster
(10.6), a new node.js application server (no real changes in node.js
software versions there) and haproxy (2.6.6 currently).
The global setup of all of this is quite the same as before but not
exactly the same. I tried however to be as close as possible to the
old setup.
Now, once I started the node.js application, the database connections
are established and after about 20 minutes I start to see application
warnings about lost connections to database.
On haproxy stats page I can see lot of 'connections resets during
tranfers' backend side.
On database side I can see idle processes that stay there even if I
close node.js application or restart haproxy. These have to timeout or
be killed to disappear. As if there was no communication any more
between haproxy and mariadb (on these tcp connections).
At the same moment other database connections are established or
continue to function. Maybe something related to idle connections ?
If it may help : all these servers are VMs in OVH public cloud and
communications between servers are established through a private vlan
in the same datacenter.
If I remove haproxy from workflow (node.js -> mariadb) I cannot see
any error anymore. But I don't understand why it worked fine before
and is working this way right now...
Any help is welcome.
My current haproxy setup :
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# See:
https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
ssl-default-bind-ciphers
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-bind-ciphersuites
TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
ssl-dh-param-file /etc/haproxy/ssl/dhparams.pem
tune.ssl.default-dh-param 2048
maxconn 50000
#nosplice
defaults
log global
option dontlognull
option dontlog-normal
timeout connect 5000
timeout client 50000
timeout server 50000
#option tcpka
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
option splice-auto
option splice-request
option splice-response
frontend db3_front
bind 127.0.1.1:3306
mode tcp
# haproxy client connection timeout is 1 second longer than the
default mariadb wait_timeout which is 28800 seconds
# this avoids haproxy to close an idle connection with no reason
timeout client 28801s
maxconn 10000
no log
default_backend db3_back
backend db3_back
mode tcp
# haproxy server connection timeout is 1 second longer than the
default mariadb wait_timeout which is 28800 seconds
# this avoids haproxy to close an idle connection with no reason
timeout server 28801s
option mysql-check user hacheck post-41
fullconn 10000
timeout check 10s
server db3sbg5 10.140.154.94:3306 maxconn 10000 check on-marked-down
shutdown-sessions
server db3de1 10.140.3.131:3306 maxconn 10000 backup check
on-marked-down shutdown-sessions
server db3gra5 10.140.103.12:3306 maxconn 10000 backup check
on-marked-down shutdown-sessions
[similar redis proxy config removed]
listen stats
bind *:443 interface ens3 ssl crt /etc/haproxy/ssl/server.pem alpn
h2,http/1.1
mode http
no log
maxconn 100
stats enable
stats uri /...
stats refresh 5s
stats show-legends
stats show-node
stats admin if TRUE
...
I tried some modifications in haproxy config (nosplice or tcpka) but
errors are still there.
I also tried previous haproxy versions (2.6.5, 2.6.4) but it doesn't
solve the problem.
--
Best regards,
Artur