On 3/22/2011 12:42 AM, Patrick Schaaf wrote: > On Mon, 2011-03-21 at 17:50 -0700, Erik Schorr wrote: > > [SNIP wish of identifying individual UDP transaction failure and > reassignment to a different real server] > >> Is this possible? > > Not without investing in the implementation of an extension of the > kernel part of IPVS. > > No part of IPVS cares about / tries to do something regarding > reassignment of individual "failed" flows to different real servers. > > It is up to a userlevel health checking application (keepalived, > ldirectord) to test and disable real servers that fail. Understood. It's not necessarily mitigation of failure, but enforcement of best-effort forwarding within a deadline.
> The kernel part just distributes new flows, according to the chosen > scheduler, to any of the non-weight-0 real servers configured, and > routes packets of known flows to the same real server as chosen > initially. > > What you desire, could work in a NAT or TUN mode, but would need roughly > these new features: > A) a configuration variable, per virtual service, indicating that more > elaborate processing is desired, and in which time interval a reply > should be received. > B) keeping a copy of the data (UDP packeet, TCP SYN) sent initially to a > real server, the copy hanging off the IPVS connection (flow) structure. > C) put such new flows on a tight timeout configured by A) > D) when a reply packet is received and its flow identified (which > already must happen for e.g. NAT mode to work), mark the flow as OK and > remove it from the tight timeout schedule > E) when the tight timeout expires, rerun the scheduler selection, > excluding the initially selected real server (*), and send the > remembered copy of the datagram / TCP SYN to the newly selected real > server. > *) should one such failure set the weight of the failing real server to > 0? Or decrease its weight? Or do nothing like that? The real server > might work almost perfectly, only having dropped somehow that single > datagram. This is pretty much dead-on. For this last part, I think a configurable threshold of "handoff-misses per time period" must be exceeded before a real server's weight is reduced. One hand-off failure per 10 seconds, perhaps, would decrease the weight by a percentage. Of course, if a monitor detects a hard failure of a real server/service, then set the weight to 0. > Further consideration might be given to the desired behaviour when > microseconds after the E) reassignment decision, the first real server > response is received, because it just sat in some queue-in-between for a > bit longer than anticipated. In this case, I believe it would be fine for the load balancer to simply drop the late reply. Has anyone else encountered a situation with these sorts of requirements? Load balancing and service monitoring is great, but offering guaranteed connection-level reliability and deadline enforcement are things I haven't seen offered except in very expensive commercial systems. It would be interesting to know how many other people might benefit from such features. > > best regards > Patrick _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
