On 06/08/12 16:11, Christopher Manigan wrote:
So that eliminates any malformed/invalid/zero response issues. As for the errors I see in the logs, I do not believe it to be a slow database. The database is responsive to other queries against the radius database while we experience timeouts and crashses.
Unless you are querying with the same type of queries against the same tables, that doesn't mean much. SQL servers are capable of parallel operation, and read versus write queries behave different, of course.
Alan's suggestion is a good one - when people report this problem it's almost always slow SQL servers. Specifically, it's usually people who are putting their accounting into SQL, but aren't maintaining the SQL table e.g. there are too few or too many indices, they're not archiving off old rows, etc.
The other thing to check is the "radutmp" module, which is very slow when the "utmp" file is large, and almost always unused and/or inferior to SQL. Other things to check are LDAP queries or "exec" scripts.
I assume you are running 2.1.12, and not an older version (which might contain bugs, but probably not ones which cause this behaviour).
Do you have any suggestions on how we might troubleshoot that end of it?
Either run the server in debug mode with "radiusd -X" and look how it is responding under load, or use standard system admin tools to determine load patterns (top, vmstat, iostat, strace, etc.)
- List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

