Alan DeKok <[email protected]> writes: > Bjørn Mork wrote: >> Bjørn Mork <[email protected]> writes: >>> The server had been running for 45 hours when this happened. I haven't >>> got the faintest idea where to start looking for the bug. >> >> I have to correct myself after looking over the logs: The server >> stopped answering authentication requsts, but it continued to answer >> accounting requests. > > Found, fixed, pushed to "v2.1.x" on github.
Yes, now it continues to answer both authentication and accounting requests, but it still stops proxying after a while (where "a while" might be something like 20+ hours and 1+ million auth requests - I have no indication that these values are fixed). The symptoms are that all home servers are marked dead/zombie. Typical obfuscated home_server list in this state: server(bjorn) ~ 71$ radmin -e "show home_server list" 192.168.8.120 1812 auth alive 0 192.168.8.120 1813 acct alive 0 192.168.8.246 1812 auth alive 0 192.168.8.246 1813 acct alive 0 192.168.8.132 1645 auth dead 0 192.168.8.132 1646 acct alive 0 192.168.8.132 1645 auth dead 3 192.168.8.132 1646 acct alive 0 192.168.8.14 1812 auth alive 0 192.168.8.14 1813 acct zombie 0 192.168.8.10 1812 auth alive 0 192.168.8.10 1813 acct zombie 0 192.168.8.210 1812 auth alive 0 192.168.8.210 1813 acct alive 0 192.168.8.50 1812 auth zombie 0 192.168.8.50 1813 acct alive 0 192.168.8.20 1812 auth zombie 0 192.168.8.20 1813 acct alive 0 192.168.8.40 1812 auth zombie 0 192.168.8.40 1813 acct alive 0 192.168.8.44 1812 auth alive 0 192.168.8.44 1813 acct alive 0 192.168.8.216 1812 auth zombie 0 192.168.8.216 1813 acct zombie 0 192.168.8.218 1812 auth alive 0 192.168.8.218 1813 acct zombie 0 192.168.8.1 1645 auth zombie 0 192.168.8.1 1646 acct zombie 4 192.168.8.137 1645 auth alive 1 192.168.8.137 1646 acct dead 0 192.168.8.150 1812 auth zombie 0 192.168.8.150 1813 acct alive 0 192.168.8.158 1812 auth zombie 0 192.168.8.158 1813 acct zombie 0 192.168.8.222 1812 auth zombie 0 192.168.8.222 1813 acct zombie 0 192.168.8.6 1812 auth zombie 6 192.168.8.6 1813 acct alive 0 192.168.8.27 1812 auth zombie 2 192.168.8.27 1813 acct zombie 0 192.168.8.158 1812 auth zombie 0 192.168.8.158 1813 acct zombie 0 192.168.8.4 1812 auth alive 0 192.168.8.4 1813 acct zombie 0 192.168.9.6 1812 auth zombie 4 192.168.9.6 1813 acct zombie 0 There are a number of servers marked "alive", but these are all servers which have been revived after the fixed period. When used, they will be marked dead/zombie again. I'm running the v2.1.x branch from github, with cbbcb5232261c5b28093c3a97d6da2a16c9e06af being the last commit. Now, I wish I could say than I was sure that some other version did not have the same problem, but I'm not. I'm afraid I haven't been running any of them continuously for a long enough period to be completely sure. But I will test that now, starting with the stable branch from git.freeradius.org, commit d7b4f003477644978f3fefa694305dce9b5dc8bf, which was the last point where things seemed to work Bjørn - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

