Re: Help! HAProxy randomly failing health checks!

Zachary Punches Sat, 19 Mar 2016 16:33:30 -0700

Looking into a TCPdump its looking like some SSL connections are getting reset. 
The code being (Warn/Sequence): Connection reset (RST) Is that helpful at all?

From: Igor Cicimov 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, March 16, 2016 at 6:51 PM
To: Zachary Punches <[email protected]<mailto:[email protected]>>
Cc: Baptiste <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Help! HAProxy randomly failing health checks!

On Thu, Mar 17, 2016 at 12:46 PM, Igor Cicimov 
<[email protected]<mailto:[email protected]>> wrote:

On Thu, Mar 17, 2016 at 11:14 AM, Zachary Punches 
<[email protected]<mailto:[email protected]>> wrote:
I wanna say average is like 4-6 connections a second? Super minimal

From what I’ve seen in the logs during the SSL errors, the log hangs then 
outputs a bunch of SSL errors all at once.

Here it the output from sysctl –p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
error: "net.bridge.bridge-nf-call-ip6tables" is an unknown key
error: "net.bridge.bridge-nf-call-iptables" is an unknown key
error: "net.bridge.bridge-nf-call-arptables" is an unknown key
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296

What would lowering the tune.ssl.default-dh-param to 1024 do?
From: Igor Cicimov 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, March 16, 2016 at 5:01 PM
To: Zachary Punches <[email protected]<mailto:[email protected]>>
Cc: Baptiste <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Help! HAProxy randomly failing health checks!

On Thu, Mar 17, 2016 at 10:55 AM, Zachary Punches 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for the reply!

Ok so based on what you saw in my config, does it look like we’re misconfigured 
enough to cause this to happen?

If we were misconfigured, one would assume we would go down all the time yeah?

From: Igor Cicimov 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, March 16, 2016 at 4:50 PM
To: Zachary Punches <[email protected]<mailto:[email protected]>>
Cc: Baptiste <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Help! HAProxy randomly failing health checks!

On Thu, Mar 17, 2016 at 10:47 AM, Igor Cicimov 
<[email protected]<mailto:[email protected]>> wrote:

On Thu, Mar 17, 2016 at 5:29 AM, Zachary Punches 
<[email protected]<mailto:[email protected]>> wrote:
I’m not, these guys aren’t sitting behind an ELB. They sit behind route53 
routing. If one of the proxy boxes fails 3 checks in 30 seconds (with 4 checks 
done a second) then Route53 changes its routing from the first proxy box to the 
second

On 3/15/16, 9:46 PM, "Baptiste" <[email protected]<mailto:[email protected]>> 
wrote:

>Maybe you're checking a third party VM :)
>

AFAIK the Route53 health checks come from different points around the globe and 
it is possible that at some time of the day AWS has scheduled some specific end 
points to perform the HC. And it is possible that those ones have different SSL 
settings from the ones performing the HC during your day time. I would suggest 
you bring up this issue with AWS support, let them know your SSL cypher 
settings in HAP and ask if they are compatible with ALL their servers 
performing SSL health checks.

I personally haven't seen any issues with failed SSL handshakes coming from AWS 
servers and have HAP's running in AU and UK regions.

Igor

That is if you are absolutely sure that the failed handshakes are not caused by 
overload or misconfigured (system) settings on HAP

I was saying this in regards to system (kernel) settings. For example, assuming 
Unix/Linux is your net.core.somaxconn actually set *higher* than your maxconn 
which is set to 30000 and 15000 in HAP? Any other kernel settings you might 
have changed? (output of "sysctl -p" command)

What is your pick hour load, how many connections/sessions are you seeing on 
each HAP?

Another suggestion is maybe set tune.ssl.default-dh-param to 1024 and see if 
that helps.

Ok, so on default ubuntu cloud instance this is what we have:

# sysctl -a | grep maxconn
net.core.somaxconn = 128

which is too low for production server. Check your value and adjust it to your 
needs.

By the way, what is accept-proxy doing there in your setup? Is there anything 
else in front of HAP using PROXY protocol?

Wait a minute:

bind *:1027 # Health checking port

are you maybe using https health check on a non SSL port?

Re: Help! HAProxy randomly failing health checks!

Reply via email to