Re: FreeRADIUS crashing on Solaris 8

2002-02-25 Thread Alan DeKok

[EMAIL PROTECTED] (Rainer Clasen) wrote:
 Even with the changes from the radiusd.c you sent me, this goto is still
 triggered.

  sigh  I think it's the threading problems.  The server still uses
a few functions which aren't thread-safe, and they should be made
thread safe.

  e.g. gmtime(),. etc.

  I've made a number of changes to the code in CVS.  It should be
*more* thread-safe than what it is right now.  I don't know if it's
perfect yet.  I'll have to audit the rest of the code, and that can
take a while.

   Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-25 Thread Todd T. Fries

While non-optimal, would a mutex lock around non threadsafe functions be a
viable workaround?  It at least allowed a program I've written to function
safely ..

Penned by Alan DeKok on Mon, Feb 25, 2002 at 05:44:48PM -0500, we have:
| [EMAIL PROTECTED] (Rainer Clasen) wrote:
|  Even with the changes from the radiusd.c you sent me, this goto is still
|  triggered.
| 
|   sigh  I think it's the threading problems.  The server still uses
| a few functions which aren't thread-safe, and they should be made
| thread safe.
| 
|   e.g. gmtime(),. etc.
| 
|   I've made a number of changes to the code in CVS.  It should be
| *more* thread-safe than what it is right now.  I don't know if it's
| perfect yet.  I'll have to audit the rest of the code, and that can
| take a while.
| 
|Alan DeKok.
| 
| - 
| List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

-- 
Todd Fries .. [EMAIL PROTECTED]

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-25 Thread Alan DeKok

Todd T. Fries [EMAIL PROTECTED] wrote:
 While non-optimal, would a mutex lock around non threadsafe functions be a
 viable workaround?  It at least allowed a program I've written to function
 safely ..

  That's about as much work as fixing the code to use the thread-safe
functions, instead of the non-thread-safe functions.


  Hmm... one approach to fixing the problem would be to edit all of
the modules to set the 'non thread safe' flag.  If that makes the
problem less signifant on Solaris, then it's definitely the thread
code causing the problems.

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-20 Thread Rainer Clasen

Alan DeKok wrote:
   Hmm... I would suggest going to src/main/radiusd.c, function
 refresh_request().  Look for:
[...]
and add:
 
   if (request-reply  (request-reply-code != 0)) goto setup_timeout;

I've added this to the version already running with your fix from
yesturday.

Now I'm waiting for results. I'll keep you informed.

I suppose I'll find some time on the weekend to migrate my logging
patches to the latest CVS.

Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-20 Thread Alan DeKok

[EMAIL PROTECTED] (Rainer Clasen) wrote:
 I've taken the changes from CVS and applied them to my patched version
 (only logging enhancements). proxy.c:1.52-1.53, radiusd.c:1.238-1.239.
 
 I got the following backtrace:
 
...
 (gdb) p *request
 $1 = {magic = 16909060, packet = 0x0, proxy = 0x0, reply = 0x0,


  And that's the magic number saying that src/main/util.c, function
request_free() was called on the request.

  That is, SOMETHING deleted the request while it was still alive.
That can ONLY result from a call to request_free(), or rl_delete().

  If you add log messages to log the request number before any call to
request_free() or rl_delete() (only in radiusd.c), then you can at
least tell *which* call resulted in it deleting an active request.
That will help a lot in tracking down the problem.

  Another suggestion is to start the server, and then add an entry in
'radiusd.conf', which is 'debug_level = 2'.  Send a HUP signal to the
server, and you will get all of the debugging logs going to the log
file, and it will still run in threaded mode.

  There will be a *lot* of log messages, though.


 again, the proxysecret belongs to a server marked dead immediately
 before the crash. 

  That shouldn't be a problem...
 
 But the secret belongs to a NAS usually not used by users of this realm.
 There was no matching entry (realm + NAS) in the logfile.

  That's a problem.

 In all cases it was due to non-auth requests.

  That's another piece of the puzzle, which is important.  Accounting
requests are deleted immediately after a response is sent to the NAS,
as there can be no duplicate accounting requests.

  It may be deleting the request too soon...

  Another suggestion is to go to radiusd.c, around line 2377.  It says
Cleaning up request %d ID %d with timestamp %08lx.  Change the 'if'
condition before that so it does NOT check for PW_ACCOUNTING_REQUEST.
If that makes the difference, then it's narrowed down.


  I'll take another look at the code to see what the heck is going on.

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-20 Thread Rainer Clasen

Hallo erstmal!

Rainer Clasen schrieb:
 Alan DeKok wrote:
Hmm... I would suggest going to src/main/radiusd.c, function
  refresh_request().  Look for:
 [...]
 and add:
  
  if (request-reply  (request-reply-code != 0)) goto setup_timeout;
 
 I've added this to the version already running with your fix from
 yesturday.
 
 Now I'm waiting for results. I'll keep you informed.

Ok, the new goto is triggered quite often. But the daemon still dies. 


Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-20 Thread Rainer Clasen

Hallo erstmal!

Rainer Clasen schrieb:
 Rainer Clasen schrieb:
  Alan DeKok wrote:
 Hmm... I would suggest going to src/main/radiusd.c, function
   refresh_request().  Look for:
  [...]
  and add:
   
 if (request-reply  (request-reply-code != 0)) goto setup_timeout;
  
  I've added this to the version already running with your fix from
  yesturday.
  
  Now I'm waiting for results. I'll keep you informed.
 
 Ok, the new goto is triggered quite often. But the daemon still dies. 

Even with the changes from the radiusd.c you sent me, this goto is still
triggered.


Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-19 Thread Alan DeKok

[EMAIL PROTECTED] (Rainer Clasen) wrote:
 Mitchell, Michael wrote:
  I'm currently testing freeradius-snapshot-20020114 (configured as a proxy
  only) on Solaris 8 and running into a problem.
 
 I'm seeing similar crashes with my patched 2002-02-11 version on Solaris
 7.

  Hmm... I'm starting to think that this may *not* be a thread
problem.

   simul_count = 0, simul_mpp = 0, finished = 1, options = 0


  The request has been marked 'finished', so NOTHING should be using
any entries of the request.  There should be NO threads or anything
else which is working with the request.

 Hmm, why is request-proxy == NULL?

  Because the request is finished, and many fields can be cleaned up.

 It is malloced in proxy_send() and never reset. rad_send(), whichis
 called a few lines before the crash doesn't seem to modify it too
 (well, actually it cant't, as it doesn't get a pointer to the
 pointer)

  Once an active request has been marked finished, then any logic
about what's supposed to happen goes out the window.


  The only thing I can think to do is to mark up the request as alive,
while a thread is processing it.  Then, add assertions that
request-finished isn't marked '1', while the thread is alive.

 I've seen the suggestion of running radiusd with -s. What side effects
 do I have to expect?

  It will probably be slower.  But it shouldn't be too serious in the
short term.

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-19 Thread Rainer Clasen

Mitchell, Michael wrote:
 I'm currently testing freeradius-snapshot-20020114 (configured as a proxy
 only) on Solaris 8 and running into a problem.

I'm seeing similar crashes with my patched 2002-02-11 version on Solaris
7.


 

#0  0x19510 in proxy_send (request=0x1b13250) at proxy.c:398
398 request-proxy-timestamp = request-timestamp - (delaypair ? 
delaypair-lvalue : 0);
(gdb) p delaypair
$1 = (VALUE_PAIR *) 0x3c7244b9
(gdb) p *request
$2 = {magic = 16909060, packet = 0x0, proxy = 0x0, reply = 0x0, proxy_reply = 0x0, 
config_items = 0x0, 
  username = 0x2047108, password = 0x0, secret = x, '\000' repeats 22 
times, child_pid = 0, 
  timestamp = 1014121657, number = 61812, proxysecret = 
xxx\000\000x\000x, 
  proxy_is_replicate = 0, proxy_try_count = 7, proxy_next_try = 1014121662, simul_max 
= 0, 
  simul_count = 0, simul_mpp = 0, finished = 1, options = 0, container = 0x41b20}
(gdb) bt 
#0  0x19510 in proxy_send (request=0x1b13250) at proxy.c:398
#1  0x15a24 in rad_respond (request=0x1b13250, fun=0x17838 rad_accounting) at 
radiusd.c:1521
#2  0x1fe60 in request_handler_thread (arg=0xa7bc0) at threads.c:169



Hmm, why is request-proxy == NULL? It is malloced in proxy_send() and
never reset. rad_send(), whichis called a few lines before the crash
doesn't seem to modify it too (well, actually it cant't, as it doesn't
get a pointer to the pointer)




#0  0x19530 in proxy_send (request=0xa00218) at proxy.c:403
403 if ( mainconfig.log_proxy_nonauth || (request-packet-code == 
PW_AUTHENTICATION_REQUEST)) {
(gdb) bt
#0  0x19530 in proxy_send (request=0xa00218) at proxy.c:403
#1  0x15a24 in rad_respond (request=0xa00218, fun=0x17838 rad_accounting) at 
radiusd.c:1521
#2  0x1fe60 in request_handler_thread (arg=0x4b8130) at threads.c:169
(gdb) p mainconfig
$1 = {log_auth = 1, log_auth_badpass = 1, log_auth_goodpass = 1, do_usercollide = 0, 
log_proxy = 1, 
  log_proxy_retransmit = 1, log_proxy_nonauth = 0, do_lower_user = 0x9d208 no, 
  do_lower_pass = 0x9d218 no, do_nospace_user = 0x9d228 no, do_nospace_pass = 
0x9d238 no, 
  nospace_time = 0x0}
(gdb) p *request
$2 = {magic = 0, packet = 0x0, proxy = 0x129030, reply = 0x0, proxy_reply = 0x9fec48, 
config_items = 0x0, 
  username = 0x0, password = 0x0, secret = xxx, '\000' repeats 24 times, 
child_pid = 0, 
  timestamp = 1014136041, number = 25277, 
  proxysecret = xxx\000xxx\000\000\000\000\000, 
proxy_is_replicate = 0, 
  proxy_try_count = 7, proxy_next_try = 1014136046, simul_max = 0, simul_count = 0, 
simul_mpp = 0, 
  finished = 1, options = 10486288, container = 0x41790}


Ok, this happens in code I've added. But shouldn't request-packet
always be non-NULL? Well and it was already used successfully in
proxy_send a few lines before.




I've seen the suggestion of running radiusd with -s. What side effects
do I have to expect?




Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-19 Thread Rainer Clasen

Rainer Clasen wrote:
 in both cases the only server for a realm was marked dead immediately
 before the crash. 

I forgot to mention: This server has the secret, which is in
request-proxy_secret when the daemon dies. So it seems to be a packet
related to the same server which was marked dead.


Rainer

-- 
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0  B0E1 0556 E25A 7599 75BD

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-02-19 Thread Alan DeKok

[EMAIL PROTECTED] (Rainer Clasen) wrote:
 in both cases the only server for a realm was marked dead immediately
 before the crash. 

  On further examination, the code in rad_respond() does NOT check for
errors returned from proxy_send().  So if the request is marked to be
proxied, and the realm is dead, then something wrong happens.

  I'll commit a bug fix now.  Grab the CVS snapshot from tonight.

  If this fixes your problem, I think we should release 0.5 ASAP.

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: FreeRADIUS crashing on Solaris 8

2002-01-16 Thread aland

Mitchell, Michael [EMAIL PROTECTED] wrote:
 Setting cleanup_delay to 0 seems to have helped to alleviate the
 problem, but it does not prevent it. The server tends to run longer before
 crashing - we're into the minutes now rather than seconds - but the problem
 is definitely still there.

  Damn.  It's probably some weird race condition...

 I saw an email in the archives from Alan that from memory said the server
 should not be considered stable running non-threaded. Is this still the
 case, or has this code been cleaned up now? It was a reasonably old email I
 think.

  If you run it in '-s' mode, it should be fine.

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



RE: FreeRADIUS crashing on Solaris 8

2002-01-15 Thread Mitchell, Michael

Hi Randy,

Thanks for your help.

Setting cleanup_delay to 0 seems to have helped to alleviate the
problem, but it does not prevent it. The server tends to run longer before
crashing - we're into the minutes now rather than seconds - but the problem
is definitely still there.


I saw an email in the archives from Alan that from memory said the server
should not be considered stable running non-threaded. Is this still the
case, or has this code been cleaned up now? It was a reasonably old email I
think.

Thanks,
Michael


 -Original Message-
 From: Randy Moore [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, 16 January 2002 11:15
 To: [EMAIL PROTECTED]
 Subject: Re: FreeRADIUS crashing on Solaris 8
 
 
 Hi,
 
 In radiusd.conf, try setting 'cleanup_delay'  to 0 or even 
 -1.  It seems to 
 be related to some different segfaults I've been seeing.  
 Mine have to do 
 with CHAP requests and SQL authentication.
 
 At 07:47 AM 1/16/2002 +0800, you wrote:
 Hi,
 
 I have the exact same problem (almost same gdb trace running on Linux
 too.
 
 It seems that the proxy.c is giving problem. I setup two clients
 (using radclient) to send  the acct records to the radius 
 daemon which
 then proxied to another server. With 1 client everything is 
 fine. Once
 the next client kicks in, the daemon crashed.
 
 /hh
 
 On Wed, 16 Jan 2002, Mitchell, Michael wrote:
 
   Hello list,
  
   I'm currently testing freeradius-snapshot-20020114 
 (configured as a proxy
   only) on Solaris 8 and running into a problem.
  
   radiusd will run for a short (seemingly random) period of 
 time (any where
   from say 10 seconds to 30 seconds) and happily processing 
 requests until it
   simply dies with a signal 9 (SIGKILL??) and core dumps. 
 The problem also
   seems to relate to the load put on the server. At 4 or 5 
 requests per 
  second
   it will start to exhibit the crashing behaviour within 
 about 30 seconds. At
   2 or 3 requests per second it is taking several minutes 
 for the problem to
   appear.
  
   The problem also has only so far appeared for accounting requests.
  
   Maybe there is a timing issue somewhere, since accounting 
 requests take 
  that
   much longer to process as my proxy has to wait for a 
 response to come back
   from the second radius server?
  
   Using gdb it appears that radiusd is crashing at at least 
 a few different
   places, which is not very helpful, and kind of suggests 
 it may not be an
   actual bug in FreeRADIUS?
  
   Here are three back traces that I captured:
  
   #0  0x188b4 in proxy_send (request=0x9cb18) at proxy.c:317
   317 request-proxy-timestamp = 
 request-timestamp;
   (gdb) bt
   #0  0x188b4 in proxy_send (request=0x9cb18) at proxy.c:317
   #1  0x15480 in rad_respond (request=0x9cb18, fun=0x170a0 
 rad_accounting)
   at radiusd.c:1527
   #2  0x1ecf8 in request_handler_thread (arg=0x98110) at 
 threads.c:169
  
   
  
 --
 --
   -
  
   #0  0xff141da4 in t_delete () from /usr/lib/libc.so.1
   (gdb) bt
   #0  0xff141da4 in t_delete () from /usr/lib/libc.so.1
   #1  0xff141998 in realfree () from /usr/lib/libc.so.1
   #2  0xff14226c in cleanfree () from /usr/lib/libc.so.1
   #3  0xff1413a0 in _malloc_unlocked () from /usr/lib/libc.so.1
   #4  0xff141294 in malloc () from /usr/lib/libc.so.1
   #5  0x22538 in rad_decode (packet=0xa00f8, original=0xa3b68,
   secret=0x98dec gloople) at radius.c:1060
   #6  0x15208 in rad_respond (request=0x98da0, fun=0x170a0 
 rad_accounting)
   at radiusd.c:1437
   #7  0x1ecf8 in request_handler_thread (arg=0x982f0) at 
 threads.c:169
  
   
  
 --
 --
   -
  
   #0  0x23220 in pairfind (first=0x190, attr=41) at valuepair.c:97
   97  first = first-next;
   (gdb) bt
   #0  0x23220 in pairfind (first=0x190, attr=41) at valuepair.c:97
   #1  0x1888c in proxy_send (request=0x9d728) at proxy.c:312
   #2  0x15480 in rad_respond (request=0x9d728, fun=0x170a0 
 rad_accounting)
   at radiusd.c:1527
   #3  0x1ecf8 in request_handler_thread (arg=0xa70f0) at 
 threads.c:169
  
   This appears to point back to the threading, but whether 
 it is a Solaris
   issue or a FreeRADIUS issue I'm not really sure.
  
   The log files don't appear (to me) to give a definitive 
 answer to what is
   happening here, except that at the time of the crash, 
 I'm getting
   incomplete attribute logging such as:
  
   Thread 2 handling request 167, (17 handled so far)
   Proxy-State = 0x313639
   Sending Accounting-Response of id 169 to 203.108.109.27:62729
   Finished request 167
   Going to the next request
   Thread 2 waiting to be assigned a request
   NAS-IP-Address = 203.108.109.27
= 1
= Async
= Start
= 123
   Proxy-State = 169
= UNKNOWN-TYPE
  
   When I run the server with the -s option