Hi We are planning to use LVS for a setup with a lot of (millions) concurrent (mostly idle) connections and were setting up sync daemon to avoid a reconnect flood when the master fails.
Originally I was planning to ask for help, but it turned out to be one of those cases where you go over the problem description and refine the details until the problem description ceases to exist. So, instead I'll post the results and what we needed tuning to get it working. Short summary: sync daemon is working very well with high connection rate if you increase rmem_default and wmem_default sysctls. Initially, there was a problem with sync_master daemon sending updates. As it just sent updates every second, the send buffer of the socket got full and we got ip_vs_sync_send_async errors in kernel log. We decreased the sleep time to 100ms which gave slightly better results, but net.core.wmem_max and net.core.wmem_default also needed increasing (which probably means, that we could have left the kernel unchanged). After that we had problems on the sync_backup daemon size, whose receive buffer now got full from time to time and resulted in lost sync packets (visible through udp receive errors). So we also increased the rmem sysctls quite a bit, which solved that problem as well. Another consideration for mostly idle connections seems to be choosing appropriate sync_threshold and tcp timeout (ipvsadm -L --timeout) values. Our current plan is to increase the tcp timeout to 30 minutes (1800) and reduce sync_threshold to (3 10) so that the connections would stay actual on the backup even with relatively infrequent keepalives being sent. Hardware for testing was a few of 2xquad opterons with 16GB memory, dual e1000 and onboard dual bnx network cards, sync_threshold = 0 1 (sync on every packet, for testing), using LVS-NAT. Set up and run by a very diligent coworker :) Some results: 8.5 million connections all synced ~100Kpackets/s of keepalives on external interface 900 packets/s of sync daemon traffic just over 100Mbps of traffic (short packets) On primary LVS, ~1% of 1 core for sync_master daemon, 1 core 10-40% in softirq (ipvs?), ~1.7GB of memory used in total On secondary LVS, ~10% of 1 core for sync_backup daemon, 1 core 20% in softirq (ipvs?), ~1.7GB of memory used in total Failover with keepalived worked as expected once all connections were established. The likely limiting factor seems to be the 1 core 40% in softirq. This was also the core which serviced the bnx network card so it's possible that switching entirely to e1000 would leviate the problem (the core responsible for e1000 was ~10% in softirq). Also, time spent in softirq was not really consistent and sometimes dropped quite low (maybe an altogether different problem). Interrupt load was low (8K/s in total) with both e1000 and bnx cards in use, although we still superstitiously suspect broadcom is not quite as scalable as intel. Siim _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
