Re: Robust Authentication Proxying

Philip Molter Sat, 11 Jul 2009 07:07:40 -0700


On Jul 11, 2009, at 2:14 AM, Alan DeKok wrote:

Philip Molter wrote:

Yes, this is the configuration I'm currently running, and it's not
working for me. I have a radclient sending a request, retrying 10times
on a 5-second timer, and after 10 retries, it still hasn't gotten a
response. After the second retry, the proxy has marked the serveras at
least a zombie and started status-checks, but every retransmit after
that is getting a cached result of no response.


 Could you possibly try READING my messages?

 The default configuration does NOT include the "do_not_respond"

policy. *YOU* are the one who configured that, as I have saidmultiple

times.

 If you don't want it to get the cached "do not respond", policy, then

        DON'T CONFIGURE IT

 It's that easy.

I do not want to get ANY cached response. I do not want to get anyAccess-Reject. If I do not configure a 'do not respond' response, Iget an Access-Reject, which is even worse because my end-client getsan error when he should not. What I want is for a no-response from ahome server to be treated as a no-response to the NAS, and thesubsequent retransmit from the NAS to be processed as a retransmit toa different home server.

This is what I want to happen

client req ->  proxy
              proxy req ->  home server #1
client ret ->  proxy
              proxy ret ->  home server #1
             [proxy fails home server #1 for lack of response]
client ret ->  proxy
              proxy req ->  home server #2
              proxy <- resp home server #2
client <- resp proxy


 It does that (mostly).  But only if you don't break the server.

No, it does not do that all. I have yet to see a retransmit from aclient actually get tried on a different server than the one used forthe original request. Once the proxy fails to receive a response fromthe originally chosen home-server, it handles the packet as afailure. If it sends back an Access-Reject packet, the request isrejected by the NAS to the client and that NAS stops retrying THATREQUEST (ie. the end-client gets an error). If I add a configurationto not send back anything, then the NAS will retransmit, but like youhave made abundantly clear, the proxy remembers that you sent back noresponse to the original request and skips all further processing ofthe retransmit.

I have set response_windows and zombie_periods to minimums. I haveset response_windows and zombie_periods to maximums. For a givensingle request, only one home server is tried, and if that home serveris down, the request and any other retransmits of that request willnot succeed. Yes, if the NAS sends another separate request with adifferent ID, it will be proxied to a different home server, but thatdoes not help the poor guy who had the hard luck of his requesthitting the bad home server. He will get an error message. He willhave to retry or call support or whatever.

This is what happens without a post-proxy config:

client req ->  proxy
              proxy req -> home server #1
client ret ->  proxy
              proxy ret -> home server #1
             [proxy fails home server #1 for lack of response]
client  <- rej proxy


 That happens for the most part because you played with the

configuration to make the proxy timeouts super-short. As I said,don't

do that.

It does not matter whether the timeouts are short or long. Thisalways happens. See my note above.

In fact, no matter what I set the timeouts to, it always seems to failthe server and reject the request after the first retransmit to theproxy (2 packets, about 10 seconds, regardless of the response_windowor zombie_period settings). Yes, a subsequent, different request willgo to a different home server, but, again, I want to use the proxy toprovide smarter resiliency across a pool of servers. If you know ofsettings for response_window and zombie_period that I can use thatwill provide the behavior in my "this is what I want to happen"example, could you provide them please? Because all of the settings Iuse seem to result in the same behavior.

Okay, so I obviously do not understand how I can tweakresponse_window

and zombie_period to make sure that requests that can be serviced by
many possible RADIUS home servers do not return an Access-Reject when
one of those home servers does not respond.


 i.e. you want NO request to fail processing when a home server fails.

 This is extremely difficult to do.  Any naive approach that has quick
failover can have other negative side-effects.  (Additional network
traffic, system load, duplicate processing of requests, etc.)

I guess I do not see those as negatives. That is exactly what I wantto happen. RADIUS network traffic is tiny. The system load createdby sending multiple requests to a home server or a bunch of homeservers is minimal. I am not seeing how you are adding any more loadwhen instead, the proxy sends back an Access-Reject, which, in thebest case scenario, will result in the end-client re-authenticating,generating yet another request. In the worst case scenario, theclient accepts the reject as validation that their account cannot beauthorized and presents the wrong result to the end-user (whether thatbe a guy sitting on the end of a dial-up line or a piece of systemsoftware trying to determine whether an account is valid). All youare doing is pushing the logic for retrying from a machine that knowsthe there are multiple possible home servers that can respond to amachine that does not via a response that says, effectively, "Do notretry. Your request is invalid."

Your argument that the RADIUS server cannot handle a retry does nothold water to me, but regardless, I can envision configurations whereyou would want to minimize all processing by the RADIUS proxy itself(most machines now have way more processing power than a simple RADIUSproxy can consume, so that is not a common need anymore). I wish theoption was available. There seem to be knobs for a lot of other things.

The client sends a request to the proxy.  If a home server does not
respond within a short period of time to the request, a second home
server is chosen.  If the second home server does not respond to the
same request, then a third is chosen. This continues until allpossiblehome servers are exhausted. At that point, an Access-Reject packetissent back to the client. Otherwise, the response from the homeserver
is sent back to the client.
Doing that requires source code mods, because that quick fail-overcan
have negative side-effects.  i.e. The server does NOT support
configurations that can negatively affect it's performance.

See my note above for why the work to be done by the server is no moreor no less than just returning a reject once the timeout is hit. Youare either going to be processing more retries to the home server ormore retries from the NAS. Either way, you are going to increase yourload.

 On top of that, the "try all possible home servers" is impossible.
There is ALSO a 30 second lifetime for the request.  After 30 seconds,

the NAS has given up, so failing over to another home server isuseless.


 On top of that, the NAS will only retry 3-6 times.  So if you have 19
home servers, at *best* it would fail over to 3-6 of them, before the
request is marked "timeout".

Okay, AT BEST you get 3-6 different home servers in a 30-secondperiod. Right now, AT BEST I get 1. Which method is more resilient?Which method results in no false rejections being returned to theNAS? The worst that can happen is that the NAS gets no response,which is exactly what would happen if the NAS queried that one homeserver directly. The proxy can even be smart about it and only retryto a different home server when the NAS retransmits (which I believeit already does), so if the NAS stops retransmitting because it hasgiven up, so does the proxy, but please, let the NAS give up first.The proxy does not know how many times that the NAS will retry. Ihave my NASes configured to retry for up to 60 seconds, once every 2seconds. They will retry 30 times. It is more important to me thatauthentication requests succeed, even if they succeed slowly. Itsounds to me like freeradius is making assumptions about how NASesshould work, and as a result, reducing the flexibility it provides.

 I sincerely hope you see now that the situation is rather more
complicated than the simple "try all home server" statement.
How do I configure that?  It doesn't seem to matter what I set
response_window or zombie_period to, once the first home serverfails to
respond, an Access-Reject (or nothing if I configure a post-proxy
handler) is returned to the client. My client's not going to retrythe
request if he gets an Access-Reject, so I need the proxy to retry it.
That last sentence is nonsense. Once the client gets an Access-Rejectfor *any* reason, it is impossible for the proxy to "retry" thatrequest.

*sigh* Exactly. Once the client gets an Access-Reject, the NAS hastold the client that the request is invalid. An end-user querying theNAS gets an error message. A piece of system software querying theNAS gets notified that the account is not valid. The implication isthat a retry is futile, even though the account is not actuallyinvalid. The account is perfectly valid. The proxy just gave up toosoon (and by too soon, I mean "before it tried more than one of itshome servers"). I want the proxy to retry the request to a differenthome server precisely to prevent the NAS (and thus the client) fromgetting an Access-Reject when it does not have to. This is typicallyhow load-balancers with failover capability work. They try their bestto make sure individual requests succeed when they can.

If you want the proxy to fail over, send it more than ONE requestat a
time (like a normal proxying system), and do NOT configure the "do not
respond" policy.

So my NAS now has to send two separate requests for the sameauthentication, and pick the one that does not come back with anAccess-Reject? Which NAS does that? Or are you saying that my end-client has to not accept the fact that he was rejected and keepretrying until he either a) gets an accept or b) gets rejected so manytimes he accepts it as gospel? Either way, it makes no sense. Eitherway, the proxy is creating a retry loop.

Again, I am not arguing that the proxy will not fail over. It willfor subsequent requests. What a fail-over solution will typically do,though, is fail over even for a given single request, so that allrequests are handled as resiliently as possible. In other words, aNAS does not need to see a single failed request from the proxy forthe proxy to trigger a failover.

 The proxy WILL fail over, but due to the imperfect nature of the
universe, some requests MAY time out and get rejected.  With a better

detection algorithm, the number of failures might get smaller thanit is

today, but it is IMPOSSIBLE to get the number down to zero.

To a NAS, there is a big difference between a timeout and a reject.If it does not get a response, a NAS will typically handle the clientdifferently than if it gets an explicit rejection. Right now, atimeout event from the home server results in an explicit rejection(unless I configure it not to send that reject). It IS possible toget the number down to zero, because I have used RADIUS software thatdoes it. The only time it should ever be non-zero is if all homeservers that can possibly be tried in a given window (which might notbe all of them, but is most likely going to be more than one of them)fail to respond. Like I said, I am trying to migrate to freeradiusfor some other features. I have used two other proprietary RADIUSserver software packages that implement this behavior.

 No.  RADIUS doesn't work like that.  No amount of magic on the proxy
will cause the NAS to retry forever (which is the only way to have the
proxy cycle through all home servers for one request). If youconfigurethe NAS to retry forever, then all you will do is push networkfailures
off to some other part of the network.

Right. Precisely. I want to push the network failure handling to theproxy, which has the knowledge that there are multiple points offailure. The NAS does not know that there are 20 possible servers torespond to it. All it knows is that there is 1 RADIUS server it cantalk to (the proxy) and if the proxy says the request was rejected,the request is considered rejected. The end-client certainly does notknow what can fail. The proxy knows that there are 20 servers. Whenit decides to fail one server out, it KNOWS a) that the proxiedrequest was not rejected, it just was not responded to by the homeserver and b) that it can try that request to another home serverbefore it tells the NAS that the request is rejected (the request hasnot been rejected, of course, since no home server has responded oneway or the other yet and until the proxy responds to the NAS, the NASwill not know one way or the other).

I also understand that Accept-Challenge can complicate the proxying,but that is solvable as well with standard state tracking.

 This is how IP connectivity works: Networks are imperfect.  There is
absolutely nothing you can do about that.

I know that networks are imperfect. The answer to that imperfectionis to retry, not to give up. When you tell a NAS that the request hasbeen rejected when, in fact, it has not, you are not effectivelyretrying. You are saying, "Do not retry. You actually got thisfailed result."

But look, I have gone through the code. Ivan's right, that there isno way to get the behavior I want in freeradius without either amodule (not sure if this is even possible to accomplish via a modulebecause proxying is not handled via a module ) or by hacking the codeto change how proxy no-responses are handled. It just frustrates methat you challenge the value of this. For people like me who usefreeradius not to serve dial gear but to serve as robustauthentication platforms for on-network services, where sending afalse rejection to a client is an SLA issue, having a proxy that canrobustly and transparently handle transient network failures is veryvaluable. With that, we do not have to reprogram or replace NASsoftware (some of which we cannot control) to handle those kinds oftransient network failures for us.


Philip
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

Re: Robust Authentication Proxying

Reply via email to