I have haproxy installed as a load balancer in front of two Exchange 2010 CAS
servers for SSL offloading and I am running into significant performance
problems (unuseable) after about 1000 concurrent connections. CPU never goes
over ~30%, concurrent connections are about ~1800 when it is falling down,
memory usage is relatively low. When it is running around 800 everything seems
to work fine. Everything works well in testing, it's only when I test moving
our production traffic to haproxy do I see problems.
Basically the site stops accepting connections at that point. If I restart
haproxy it work but only for a short time before becoming unresponsive. I have
looked at various tcp OS optimizations without much hope or any success. A
basic count, something like netstat -an| wc -l shows about 58K connections.
The only thing I found that I think may be causing this is Outlook Anywhere/RPC
over HTTPS. I did not find the option for http-no-delay until after testing so
I am wondering if this one setting could cause this type of behaviour? I am
assuming it might since connections are hanging until the client timeout. I had
not seen this referenced in any of the example exchange 2010 or 2013 configs.
I am just wondering if I am on the right track or if anyone else can share
their experience with offloading exchange ssl connections including Outlook
Anywhere clients.
Here are the relevant parts of my config. Note I did NOT have http-no-delay
set. This is in place for testing for our next maintenance window.
defaults
# option http-server-close # set Connection: close to inspect all HTTP
traffic
option http-keep-alive # This is actually the default and keeps the
connection
# open to both client and serve
option http-no-delay # forward packets immediately, needed for RPC over
HTTPS
option dontlognull # Do not log connections with no requests
option redispatch # Try another server in case of connection failure
option contstats # Enable continuous traffic statistics updates
retries 3 # Try to connect up to 3 times in case of failure
timeout connect 5s # 5 seconds max to connect or to stay in queue
timeout client 300s # 5 minute timeout for clients
timeout server 300s # 5 minute timeout for servers
timeout http-keep-alive 1s # 1 second max for the client to post next request
timeout http-request 15s # 15 seconds max for the client to send a request
timeout queue 30s # 30 seconds max queued on load balancer
timeout tarpit 1m # tarpit hold tim
backlog 10000 # Size of SYN backlog queue
....
frontend vs_owa_DOMAIN_https
bind IP.IP.IP.IP:80 name vs_owa_DOMAIN_http
bind IP.IP.IP.IP:443 name vs_owa_DOMAIN_https ssl crt
/etc/ssl/certs/email.DOMAIN.org.pem
mode http
log global
option httplog
capture request header User-Agent len 64
capture request header Host len 32
option forwardfor # add X-Forwarded-For to headers
log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %ST\ %B\ %CC\ %CS\
%tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\
{%sslv/%sslc/%[ssl_fc_sni]/%[ssl_fc_session_id]}\ %{+Q}r
maxconn 5000
http-request redirect scheme https code 302 if !{ ssl_fc }
http-request redirect location /owa/ code 302 if { hdr(Host)
<WEBMAIL_VIRTUAL_HOST> } { path / }
default_backend pool_owa_DOMAIN_http
backend pool_owa_DOMAIN_http
balance roundrobin
mode http
log global
option prefer-last-server
option httplog
option forwardfor
option redispatch
stick-table type ip size 10240k expire 30m
stick on src
default-server inter 3s rise 2 fall 3
cookie SERVERID insert indirect nocache
server SRV1 IP.IP.IP.14:80 maxconn 2000 weight 10 check cookie srv1
server SRV2 IP.IP.IP.26:80 maxconn 2000 weight 10 check cookie srv2