Josip Rodin wrote: > Returning to the original problem, in my pool of two fail-over home servers > I now have both of them set up with "status_check = none".
2.1.7 has some changes in proxy fail-over. The *first* packet that discovers that a home server is dead is no longer rejected. Instead, it fails over to the second home server. This makes proxying more robust. > My upstream proxy maintainers refuse to implement decent status checks, > so I'm forced to do this for now. I can do a status check with an entry > from a particular HL RADIUS that I happen to control, but that just creates > a daisy-chain of SPoFs. :/ They insist that I not do anything like this, > but that I set up my server so that it stubbornly tries their first server, > then if that fails their second server, for each request. That's stupid. It increases latency, bandwidth used, and decreases reliability. The Status-Server draft says that using Status-Server is preferable to the alternatives. Maybe they'll follow it once it becomes an RFC. > Now, when a request comes through that gets discarded by the first proxy > (because it itself times out on a random HL RADIUS), that one gets marked as > a zombie. Strangely enough, my server keeps it marked as a zombie even after > several minutes (long past any of the zombie_period and revive_interval > periods I've kept in the configuration). My server keeps talking only with > the second server which is in the 'alive' state, and ignores the zombie. Hmm... the "zombie_period" timers depend on continued packet streams. If the NAS doesn't re-transmit packets, then it could stay zombie for a while. I'll have to take a look at that. > After re-reading proxy.conf comments, this actually looks logical - there is > no kind of a status check that would unmark it as a zombie. revive_interval > can resurrect it from the 'dead' state, but not from the zombie state. Also > this part of the revive_interval comment is a bit confusing: > > # As a result, we recommend enabling status checks, and > # we do NOT recommend using "revive_interval". > # > # The "revive_interval" is used ONLY if the "status_check" > # entry below is not "none". Otherwise, it will not be used, > # and should be deleted. > > So it's supposed to be a crutch only for people who *have* status checks, > but not a crutch for those of us who do *not* have status checks. Huh? That's not what it says. It says "revive_interval" is ONLY for people who have "status_check = none". i.e. no status checks. > What is a crutch for this situation? A cron job that keeps doing > radmin -e 'set home_server state X Y alive'? :) If you don't have status-checks, then the "revive_interval" should apply. If it's not being applied, that should be fixed. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

