I plan to deploy a lab environment to start testing LVS as a load balancer in front of a group of what could be called nameservers. These nameservers are actually serving telephone call routing, filtering, and translation data using UDP dns-style queries, and in our production environment normally serve 500-2000 queries per second, each.
The "clients" initiating these queries are ACME session border controllers and various other VOIP/SIP processing equipment. A failure of the involved systems to pass a lookup to the servers, process the lookup, return a response, and route it back to the client is considered critical as it means a call gets dropped or is left with "dead air". Best case, the call gets delayed by a few seconds as a request times out and (hopefully) gets processed by a device that is able to respond to the retransmitted query. I'm aware of the benefit to lowering the UDP session timeout to 15 seconds for high-volume DNS load balancing and plan to do this, but I was wondering if LVS/IPVS incorporates methods to guarantee delivery of a UDP request packet to a server that's able to respond to it, no matter what. In other words, if a DNS request comes into the VIP on the load balancer, the load balancer forwards it (either via routing or nat) to a "real server", but that real server is unable to correctly receive that packet or process the query it contains for any reason, be it a dropped packet on the wire, intermittent CPU saturation, a missed interrupt, etc, then it would be desirable for the load balancer to detect that a response has not been sent back to the client from the realserver and basically re-send the same packet (same payload) to another real server in the cluster. The typical time it takes one of these servers to respond is usually less than 50ms, but may be as high as 100ms. If 200ms has passed after a request and the chosen server hasn't responded yet, retransmit a copy of the original request packet to a new server without the requesting client realizing there was a timeout. Is this possible? When there are 10,000 requests being processed per second, dropping even one packet per 100,000 is disastrous for our stats. _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
