hey,

first, use "option mysql-check", for better service checking. you'll have to add a user and access to the database, and the howto is in the configuration.txt file (https://www.haproxy.org/download/2.1/doc/configuration.txt).  the "option httpchk" is doing you nothing because the backend isnt talking HTTP and the mode is tcp, for mysql.

second, look into the proxy protocol, and you can have HAProxy send the client IP in a TCP header, similar to the X-Forwarded-For header in HTTP.  you need to add a line like:

proxy-protocol-networks=::1, localhost, <ip.add.re.ss/nm>

into the my.cnf or mariadb-server.cnf file.  replace the ip with a network cidr, without the brackets, to specify client ranges that should be sent using the proxy protocol.  then add the check option "send-proxy-v2" to the sever line in the backend of HAproxy.  mine is:

server mariadb1 192.168.88.1:3306 check inter 10000 send-proxy-v2

this will help you better identify the client that is losing the connection.

if there is a firewall between the client and HAProxy, look at the logs there.  the firewall could be reaping the connections if they are long running and the firewall hits a threshold, gets busy or maybe has a policy update pushed to it.  something in between could be an issue.

hope this helps,

brendan

On 8/1/23 8:11 PM, David Greenwald wrote:
Hi all,

Looking for some help with a networking issue we've been debugging for several days. We use haproxy to TCP load-balance between Kafka Connectors and a Percona MySQL cluster. In this set-up, the connectors (i.e., Java JDBC) maintain long-running connections and read the database binlogs. This includes regular polling.

We have seen connection issues starting with haproxy 2.4 and persisting through 2.8 which result in the following errors in MySQL:

|2023-07-31T17:25:45.745607Z 3364649 [Note] Got an error reading communication packets|

As you can see, this doesn't include a host or user and is happening early in the connection. The host cache shows handshake errors here regularly accumulating.

We were unable to see errors on the haproxy side with tcplog on and have been unable to get useful information from tcpdump, netstat, etc.

We are aware FE/BE connection closure behavior changed in 2.4. The 2.4 option of idle-close-on-response seemed like a possible solution but isn't compatible with mode tcp, so we're not sure what's happening here or next steps for debugging. Appreciate any help or guidance here.

We're running haproxy in Kubernetes using the official container, and are also not seeing any issues with current haproxy versions with our other (Python) applications.

A simplified version of our config:

global
     daemon
     maxconn 25000

defaults
     balance roundrobin
     option dontlognull
     option redispatch
     timeout http-request 5s
     timeout queue 1m
     timeout connect 4s
     timeout client 50s
     timeout server 30s
     timeout http-keep-alive 10s
     timeout check 10s
     retries 3

frontend main_writer
     bind :3306
     mode tcp
     timeout client 30s
     timeout client-fin 30s
     default_backend main_writer

backend main_writer
     mode tcp
     balance leastconn
     option httpchk
     timeout server 30s
     timeout tunnel 3h
     server db1 <ip>:3306 check port 9200 on-marked-down shutdown-sessions weight 100 inter 3s rise 1 fall 2      server db2 <ip>:3306 check port 9200 on-marked-down shutdown-sessions weight 100 backup      server db3 <ip>:3306 check port 9200 on-marked-down shutdown-sessions weight 100 backup




*

David Greenwald

Senior Site Reliability Engineer


david.greenw...@discogsinc.com

*

The contents of this communication are confidential. If you are not the intended recipient, please immediately notify the sender by reply email and delete this message and its attachments, if any.

Reply via email to