Thank you for your detailed response Willy, Some points I should clarify:
> On May 30, 2016, at 1:01 PM, Willy Tarreau <[email protected]> wrote: > > Hello, > > On Sun, May 29, 2016 at 12:53:21AM +0430, Behrad Zari wrote: >> We have over 90K concurrent TCP persistent connections on our single haproxy >> instance facing mobile clients on the Internet. Our normal conn rate for >> 100K is around >> 200/sec which I don't have a clue if it is a good one or not for public >> internet mobile >> clients with keepalive 3-5mins. > > It's easy to do the computation, 100K/200 = 500 seconds of duration on average > for a connection. It may indicate that your website is well designed (few > redirects etc) as it's not quite common to see that low a connection rate for > a high connection count. Our case is persistent TCP connections with keepalive = 5min, clients are mobile phones. Not a website. We see around 80 conn/sec for 90K concurrent (online users) connections (about 400K total users), means each second we see 80 disconnects/connects. Is it justifiable with ISP routers or going to sleep phones reseting connections? > >> Our stats normally shows: >> (process #1, nbproc = 1) >> system limits: memmax = unlimited; ulimit-n = 4000096 >> maxsock = 800060; maxconn = 200000; maxpipes = 200000 >> current conns = 104402; current pipes = 0/0; conn rate = 164/sec >> Running tasks: 24/104441; idle = 10 % > > Here there's something abnormal : 10% idle (meaning the process spends 90% > of it time doing something). For only 164 conn/s! > >> After a shortage in our backends, when all clients got disconnected, and >> tried to >> reconnect afterwards, We saw haproxy machine was not responsive! Even stats >> page >> was not showing up completely, conn rate was around 6600/sec at once, and >> only some >> kernel TCP too many orphaned sockets in messages log. It took us hours, >> starting with >> a very low maxconn & increasing it timely to handle clients rush of >> reconnects...! :( > > Here's the list of possible things I'm thinking about for such a situation : > - improperly tuned sysctls (somaxconn and tcp_max_backlog) limiting the > incoming connection rate ; this results in the inability to accept new > connections and SYN packets being retransmitted by the client. The CPU > should then be saturated. > > - too short a "backlog" parameter in the affected frontend (causes the > same effect as above). > > - a "rate-limit sessions" setting being present. CPU will not be saturated. > > - huge amount of iptables rules on the system causing each new connection to > take a long time to evaluate. In this case most of the CPU will be spent > in softirq. > > - a lot of SSL rekeying. As a rule of thumb, a 3 GHz CPU can produce > around 1000 RSA2048 signatures per second. Since you're running at 200/s > when everything's OK, less than 20% of the CPU is spent doing handshakes. > If all connections have to be re-handshaked, it can take a while (100 sec > of pure CPU just for RSA). This can definitely cause large delays. The > recent TLS ticket secret manipulation that was introduced in 1.7-dev can > definitely help regarding this. > > - a large number of orphans can eat a lot of socket memory. It may be > possible that you experienced some packet drops because of a large number > of orphans. But at the same time the fact that the kernel complained about > them indicates it got rid of them, which means that you reached the orphan > limit before reaching the memory limit, so it should not be an issue in > fact. We had 100% of a single core. SSL offloading was eating CPU (all connection setups should be unwrapped). We tried what you had proposed in the mailing list to offload using nbproc 2-n cores and proxy to a local frontend to continue as normal. This helped us to overcome the situation for now. But now we missed correct stats…. :( > >> We have multiple front-end, backends, also using SSL offloading & some >> sysctl >> tunings. >> >> Is multiple thousand TCP persistent conn rate too much for a single haproxy >> box? > > No, but it definitely requires some careful tuning. At least the fact that it > usually works seems to indicate to me that you've done this tuning (or part of > it). We run on a VM. Do you recommend running on bare metal instead? Or a single haproxy box as SSL offloaded role in front of another haproxy as a normal LB? I’ve also found Varnish’s Hitch is an active project. Won’t it better we delegate the SSL stuff to him instead? > >> Which kernel/tcp/haproxy parameters can we use to tune-up for such scenarios? > > Well, a lot :-) > > Please start by checking the points above to see if something may match your > case. > > Regards, > Willy

