Hello list,

I'm currently testing freeradius-snapshot-20020114 (configured as a proxy
only) on Solaris 8 and running into a problem.

radiusd will run for a short (seemingly random) period of time (any where
from say 10 seconds to 30 seconds) and happily processing requests until it
simply dies with a signal 9 (SIGKILL??) and core dumps. The problem also
seems to relate to the load put on the server. At 4 or 5 requests per second
it will start to exhibit the crashing behaviour within about 30 seconds. At
2 or 3 requests per second it is taking several minutes for the problem to
appear.

The problem also has only so far appeared for accounting requests.

Maybe there is a timing issue somewhere, since accounting requests take that
much longer to process as my proxy has to wait for a response to come back
from the second radius server?

Using gdb it appears that radiusd is crashing at at least a few different
places, which is not very helpful, and kind of suggests it may not be an
actual bug in FreeRADIUS?

Here are three back traces that I captured:

#0  0x188b4 in proxy_send (request=0x9cb18) at proxy.c:317
317                     request->proxy->timestamp = request->timestamp;
(gdb) bt
#0  0x188b4 in proxy_send (request=0x9cb18) at proxy.c:317
#1  0x15480 in rad_respond (request=0x9cb18, fun=0x170a0 <rad_accounting>)
    at radiusd.c:1527
#2  0x1ecf8 in request_handler_thread (arg=0x98110) at threads.c:169

----------------------------------------------------------------------------
-

#0  0xff141da4 in t_delete () from /usr/lib/libc.so.1
(gdb) bt
#0  0xff141da4 in t_delete () from /usr/lib/libc.so.1
#1  0xff141998 in realfree () from /usr/lib/libc.so.1
#2  0xff14226c in cleanfree () from /usr/lib/libc.so.1
#3  0xff1413a0 in _malloc_unlocked () from /usr/lib/libc.so.1
#4  0xff141294 in malloc () from /usr/lib/libc.so.1
#5  0x22538 in rad_decode (packet=0xa00f8, original=0xa3b68, 
    secret=0x98dec "gloople") at radius.c:1060
#6  0x15208 in rad_respond (request=0x98da0, fun=0x170a0 <rad_accounting>)
    at radiusd.c:1437
#7  0x1ecf8 in request_handler_thread (arg=0x982f0) at threads.c:169

----------------------------------------------------------------------------
-

#0  0x23220 in pairfind (first=0x190, attr=41) at valuepair.c:97
97                      first = first->next;
(gdb) bt
#0  0x23220 in pairfind (first=0x190, attr=41) at valuepair.c:97
#1  0x1888c in proxy_send (request=0x9d728) at proxy.c:312
#2  0x15480 in rad_respond (request=0x9d728, fun=0x170a0 <rad_accounting>)
    at radiusd.c:1527
#3  0x1ecf8 in request_handler_thread (arg=0xa70f0) at threads.c:169

This appears to point back to the threading, but whether it is a Solaris
issue or a FreeRADIUS issue I'm not really sure.

The log files don't appear (to me) to give a definitive answer to what is
happening here, except that at the time of the "crash", I'm getting
incomplete attribute logging such as:

Thread 2 handling request 167, (17 handled so far)
        Proxy-State = 0x313639
Sending Accounting-Response of id 169 to 203.108.109.27:62729
Finished request 167
Going to the next request
Thread 2 waiting to be assigned a request
NAS-IP-Address = 203.108.109.27
         = 1
         = Async
         = Start
         = "123"
        Proxy-State = "169"
         = UNKNOWN-TYPE

When I run the server with the "-s" option it seems to fun fine and does not
exhibit this behaviour.

Once again this appears to point towards a problem with threading?

I know there are at least a couple of people running FreeRADIUS on Solaris 8
and just wondering if anyone can possibly point me in a direction to start
looking for the problem, or if there is a known issue with solaris that
requires a patch or something similar?


Many thanks for your time and effort,
Michael

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

Reply via email to