We have over 90K concurrent TCP persistent connections on our single haproxy 
instance facing mobile clients on the Internet. Our normal conn rate for 100K 
is around 
200/sec which I don't have a clue if it is a good one or not for public 
internet mobile 
clients with keepalive 3-5mins. 

Our stats normally shows:
(process #1, nbproc = 1)
system limits: memmax = unlimited; ulimit-n = 4000096
maxsock = 800060; maxconn = 200000; maxpipes = 200000
current conns = 104402; current pipes = 0/0; conn rate = 164/sec
Running tasks: 24/104441; idle = 10 % 

After a shortage in our backends, when all clients got disconnected, and tried 
to 
reconnect afterwards, We saw haproxy machine was not responsive! Even stats 
page 
was not showing up completely, conn rate was around 6600/sec at once, and only 
some 
kernel TCP too many orphaned sockets in messages log. It took us hours, 
starting with 
a very low maxconn & increasing it timely to handle clients rush of 
reconnects...! :(

We have multiple front-end, backends, also using SSL offloading & some sysctl 
tunings.

Is multiple thousand TCP persistent conn rate too much for a single haproxy 
box? 
Which kernel/tcp/haproxy parameters can we use to tune-up for such scenarios?

Reply via email to