At 10:22 AM 25/07/01 +0000, Miquel van Smoorenburg wrote:
Hi Mike,
Thanks for your replies btw, they're appreciated :) It's so refreshing to
be able to get a reply from someone who actually WROTE a
program.....(unlike most programs written for that "other" OS.....)
>In article <[EMAIL PROTECTED]>,
>Simon Byrnand <[EMAIL PROTECTED]> wrote:
>>We've had some problems with stop records going missing from time to time
>>from a remote NAS, (over which we have no direct control) and when this
>>happens, the user of course shows as online when they're not.
>
>Is there a proxy in between? There are proxies that do not implement
>that radius protocol correctly, they just send the accounting packets
>through once and don't retry if they don't get an ack back.
Yes, there is a proxy in between. I believe it runs "Enhanced Merit AAA
Radius". Both the remote NAS's and the radius proxy are owned, located at,
and run by the telco. (Anybody from New Zealand reading this will
immediately know I'm talking about ipnet, and probably be rolling their
eyes ;)
The idea is that local calls are handled by our MAX6000, while calls from
anywhere else nationwide are are handled by this ipnet system. Seeing that
the telco has an almost complete monopoly, all ISP's in NZ are pretty much
forced to use ipnet to provide coverage outside their own POP areas.
(Naturally larger ISP's have more of their own POP's, and therefore rely
less on ipnet) But I digress...
So what is the retransmit schedule of a typical NAS ? Is there any standard
or is it just generally accepted that a NAS should keep trying forever to
get them through, or for some period of time ? I presume the radius proxy
in question USUALLY tries at least 3 times, because I occasionally see an
Acct-Delay-Time of 5 or 10, but that doesnt mean it ALWAYS does, of course,
and its not clear whether that delay was due to retries between the NAS's
(which are located all around the country) and the proxy, (located at a
central telco installation) or the proxy and us.
>>Now in this situation the times for a user reported by analyzing
>>detail files and the radwtmp are clearly going to be different, and the
>>radwtmp is the more conservative of the two. A program reading the detail
>>files has no way of knowning that checkrad discovered a stuck session.
>
>True.
>
>>The other thing that happens is that somebody else logs into the same port
>>on the NAS, and radiusd immediately notices the previous session must have
>>been stuck, however in this case it doesnt zero the time in radwtmp, it
>>just assumes they logged out at the same time the other user logged into
>>the same port.
>
>Yes. Most "last" programs treat this correctly - they notice that
>a port was re-used and 'stop' the session that was active on that port.
Yes. I belive sac also does this, when reading the radwtmp.
>>A program like sac analyzing the detail files _should_ be
>>able to notice this apparent reuse of the same NAS port and deduce that a
>>lost stop record occured and give the same result as reading the radwtmp,
>>however I have _not_ confirmed that sac 1.8 actually does this, so at this
>>stage it is supposition.
>
>It should indeed do that, perhaps you can send a polite suggestion
>to the author ?
I'm fairly sure that it already does. Perhaps I'll email the author or try
to construct a test to find out for sure.
>>I've done extensive comparisions of the times calculated from radwtmp, and
>>those calculated from detail files (using sac) for users that havn't
>>suffered lost stop records, and calculated times are _identical_ within
>>about 1 second, and in every case where there were lost stop records, the
>>radwtmp gives the more conservative time of the two. I'd rather err on the
>>safe side when I know a NAS box is giving incomplete data...
>
>On your safe side, not on the customers .. it is possible that a
>customer logs in at 00:00 AM, logs out at 01:00 AM but the
>stop packet gets lost. Then at 06:00 AM the same port gets reused
>by another dialin customer and the first one gets billed for 6
>hours of usage instead of one.
Exactly. I'm sure this kind of thing is already going on. But by safe side
I mean that the times reported from radwtmp are always _less_ than those
reported by detail in my situation, as usually the user also gets
disconnected when problems are occuring with the accounting (pointing the
finger even more at the telco's system) and when they reconnect checkrad is
detecting a stuck session and zeroing their session time. Still *far* from
ideal, but until the real problem is resolved I'll take the lower usage
times of the two any day.
>If the NAS supports 'alive' packets (or are they called 'update' packets
>now?) perhaps you can get it to send alive packets every say 5 minutes,
>and when an alive packet hasn't been received in the last 15+1 minutes,
>mark the session as 'dead' (and use the last received alive packet as
>STOP packet - it should have the most recent acct-session-time etc).
>
>This could get a bit busy if you have 2000 dialin lines though
I'm not aware of whether their proxy server supports alive packets or not.
It may be worth trying to find out. (But probably difficult. They're very
reticient about any kind of custom configuration) We have only 40 ports
through this system (the rest are all local on a MAX6000 connected directly
to our network, which never gives radius problems) so alive packets may be
a workable interim solution.
Is there anything special that needs to be configured in radiusd (remember
I'm still running cistron 1.6.4 at the moment) for it to recognise alive
packets ? Or does it automatically know what to do about terminating a
session when alive packets dont arrive ?
Regards,
Simon Byrnand
-
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html