On Fri, May 15, 2009 at 11:33:03AM -0400, Sean Millichamp wrote: > On Fri, 2009-05-15 at 09:25 +1000, Simon Horman wrote: > > On Thu, May 14, 2009 at 05:35:27PM -0400, Sean Millichamp wrote: > > > This implements a global or per-virtual service option indicating how > > > many successive check failures must occur before ldirectord will remove > > > (or quiesce) the real server. A check success will reset the counter of > > > failures. > > > I wonder if it would be better for it to replace the existing > > checkcount implementation. Which seems to be the same but only implemented > > for ping checks. failurecount does seem to be a better name, perhaps > > it could be an alias for checkcount or vice versa? > > I agree that failurecount could probably replace checkcount with one > caveat: If administrators have a checkcount set globally that they > expect to only impact pings and we alias it to impact everything then an > update will change the behavior they had before. > > Now, you could argue that an administrator of a Linux load balancer > running ldirectord should know better than to upgrade a key piece of > software without reading the documentation and changelogs. I would > agree, but surely someone will upgrade without doing that level of > investigation. > > I feel like the right approach is to leave both included, for now, and > maybe mark checkcount as deprecated with a configuration warning that it > will be removed in an future release. Thoughts?
That is a very good point, I agree. > > > +The number of consecutive times a failure will have to be reported by a > > > +check before the realserver is removed from the kernel's LVS table. A > > > +value of 1 will have the realserver removed on the first failure. A > > > +successful check will reset the failure counter to 0. > > > > I think it would be better to say something like "before the realserver > > is considred to have failed" as if quiescent=yes the server won't > > actually be removed from the kernel's LVS table. > > Agreed. > > [crazy patch snipped] > > This logic looks a bit hairy, though not unexpectedly so. > > How well tested is it? > > Indeed! It took me a while to unravel exactly what was going on and how > it needed to behave. We have been running this patch for about 2 weeks > now without issues. Our heavily loaded servers will still fail checks > periodically (as evidenced by output in the monitorfile) but they don't > get removed unless the checks continue to fail. Also, prior to > deploying it I tried to simulate a number of failure scenarios that we > might see in our environment (quiescent=yes, fork=yes, connect and > negotiate check types) and all worked as desired. > > I am attaching a patch on the previous patch that adds deprecation > warnings and cleans up the failurecount documentation (I don't see a way > in hg to export a combined patch). Thanks. This looks good. I'll review it a bit more closely once I've had some sleep. _______________________________________________ Please read the documentation before posting - it's available at: http://www.linuxvirtualserver.org/ LinuxVirtualServer.org mailing list - [email protected] Send requests to [email protected] or go to http://lists.graemef.net/mailman/listinfo/lvs-users
