Re: Error: CHILD: exit on signal (11)

2002-08-27 Thread John

Just got back from a 2 week sabatical, hoping to pick up where I left off.


John [EMAIL PROTECTED] wrote:
 After running flawlessly for a couple of weeks, suddenly and inexplicably,
the
 radius server started spawning process and reached the maximum default of
32
 (continued running), complained about unresponsive child processes, and
then
 died with signal 11.  

|That's most likely due to a back-end database locking, or a bug in
|the server.   I would suggest upgrading to 0.7, as it has more bug fixes.  Also,
|ensure that you've deleted all old 'rlm' modules from the system. 

The version I am running is 0.7 (I upgraded to .7 from .6 originally before writing 
into the list).  However, I wasn't sure if I had deleted the rlm modules, so I did 
that yesterday (actually, I did a fresh install), and the problem still persists.  I 
looked through the cvs logs and have not seen any work done to rlm_ldap, or at 
least nothing as far as bug fixes since 0.7.  Reading through the other replies, 
the symptons are very similiar to the ones seen by Todd Fries in:
http://lists.cistron.nl/archives/freeradius-users/2002/08/frm01266.html with the sql 
module.  

Any thoughts?
-- 
John Hogenmiller, kb3dfz
Systems Administrator, Pennswoods.net
877.716.2002 ext 529
---
Chris then consulted his Friend *snip*, a fellow co worker
and he to then thought of making this a success.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-27 Thread Kostas Kalevras

On Tue, 27 Aug 2002, John wrote:

 Just got back from a 2 week sabatical, hoping to pick up where I left off.


 John [EMAIL PROTECTED] wrote:
  After running flawlessly for a couple of weeks, suddenly and inexplicably,
 the
  radius server started spawning process and reached the maximum default of
 32
  (continued running), complained about unresponsive child processes, and
 then
  died with signal 11.

 |That's most likely due to a back-end database locking, or a bug in
 |the server.   I would suggest upgrading to 0.7, as it has more bug fixes.  Also,
 |ensure that you've deleted all old 'rlm' modules from the system.

 The version I am running is 0.7 (I upgraded to .7 from .6 originally before writing
 into the list).  However, I wasn't sure if I had deleted the rlm modules, so I did
 that yesterday (actually, I did a fresh install), and the problem still persists.  I
 looked through the cvs logs and have not seen any work done to rlm_ldap, or at
 least nothing as far as bug fixes since 0.7.  Reading through the other replies,
 the symptons are very similiar to the ones seen by Todd Fries in:
 http://lists.cistron.nl/archives/freeradius-users/2002/08/frm01266.html with the sql
 module.

 Any thoughts?

The ldap module should be able to tolerate bad ldap servers. If anything goes
wrong again post the radius logs and try to find a core dump and do a backtrace
(allow_core_dumps = yes directive in radiusd.conf)

Also make sure that max_request_time is quite larger than the timeouts defined
in the ldap module configuration. Probably
max_request_time = net_timeout + timeout + 10

--
Kostas Kalevras Network Operations Center
[EMAIL PROTECTED]  National Technical University of Athens, Greece
Work Phone: +30 10 7721861
'Go back to the shadow' Gandalf



- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-22 Thread Todd T. Fries

oof. Sorry.  Thanks.
-- 
Todd Fries .. [EMAIL PROTECTED]

(last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $)

Penned by [EMAIL PROTECTED] on Thu, Aug 22, 2002 at 09:17:44AM +0500, we have:
| On Wed, Aug 21, 2002 at 02:36:08PM -0500, Todd T. Fries wrote:
|  On a side note, perhaps I should release the socket only when the access of
|  the 'row' pointer is done?  Or perhaps the api should be altered (again) to
|  pass a pointer array into fetch_row so that the socket can be released without
|  the potential for over-writing prior results?
| This question was discussed at 
| http://www.mail-archive.com/freeradius-users@lists.cistron.nl/msg07312.html
| 
| -- 
| Denis Tatarskikh [UdSU/MF] [UdSU/IC]mailto:[EMAIL PROTECTED]
| 
| 
| 
| - 
| List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-22 Thread Alan DeKok

[EMAIL PROTECTED] wrote:
 On Wed, Aug 21, 2002 at 02:36:08PM -0500, Todd T. Fries wrote:
  On a side note, perhaps I should release the socket only when the access of
  the 'row' pointer is done?  Or perhaps the api should be altered (again) to
  pass a pointer array into fetch_row so that the socket can be released without
  the potential for over-writing prior results?
 This question was discussed at 
 http://www.mail-archive.com/freeradius-users@lists.cistron.nl/msg07312.html

  I'll add:

http://www.freeradius.org/cvs-log/2002-07-26.08:00:02.html

  Look for the word 'dendy'

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-21 Thread Todd T. Fries

I've delved into this.  The perl script that is 'mysqlhotbackup' does the
following:


.. and thus it locks it (from my experience) for read only.

So I'm testing with:

while :; do sleep $[ $RANDOM / 8192 ]; rcmysql stop; rcmysql start; done

This tends to tickle things.

I've run 'gdb ./radiusd' then '(gdb) run -f' and I am at the following
prompt:

[New Thread 31776 (LWP 27266)]
Delayed SIGSTOP caught for LWP 27266.
LWP 26565 exited.
Cannot find thread 33: invalid thread handle
(gdb) tr
trace command requires an argument
(gdb) bt
#0  0x401c0931 in __linuxthreads_create_event () from /lib/libpthread.so.0
#1  0x401ba3bb in pthread_handle_create () from /lib/libpthread.so.0
#2  0x401b9d2d in __pthread_manager () from /lib/libpthread.so.0
#3  0x401b9e3d in __pthread_manager_event () from /lib/libpthread.so.0
(gdb) 

This doesn't look very familiar to me, perhaps I am in some other thread's
context and not the place where the problem occurred?
-- 
Todd Fries .. [EMAIL PROTECTED]

(last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $)

Penned by Alan DeKok on Mon, Aug 19, 2002 at 02:42:19PM -0400, we have:
| Todd T. Fries [EMAIL PROTECTED] wrote:
|  It seems to happen when the database is doing a hot-backup and is
|  unresponsive/slow for a few (10-15) minutes.
| 
|   If authorization depends on that database, and it goes down for
| 10-15 minutes, then there's not much point in running the server
| during that time.
| 
|   If the MySQL server really does disappear during backups, I'd
| suggest doing something else to keep the RADIUS alive...
| 
| 
|  Mon Aug 19 00:16:47 2002 : Error: rlm_sql:  There are no DB handles to use!
|  Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11)
| 
|   Hmm.. that's an unchecked de-referencing of a NULL pointer
| somewhere.  Without more information, it's hard to know where.
| 
|   Alan DeKok.
| 
| - 
| List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-21 Thread Todd T. Fries

..more..

(gdb) bt full
#0  rlm_sql_authorize (instance=0x42735fd0, request=0x42a5bf74)
at rlm_sql.c:492
check_tmp = (VALUE_PAIR *) 0x0
reply_tmp = (VALUE_PAIR *) 0x0
passwd_item = (VALUE_PAIR *) 0x42a81034
found = 1
sqlsocket = (SQLSOCK *) 0x427d1fe8
row = 0x42a81034
querystr = SELECT Value,Attribute FROM radcheck WHERE UserName = 'toddtest' 
AND ( Attribute = 'User-Password' OR Attribute = 'Password' OR Attribute = 
'Crypt-Password' ) ORDER BY Attribute DESC\000ergroup.GroupName...
ret = 0
sqlusername = toddtest, '\000' repeats 509 times
#1  0x080569f0 in call_modsingle (component=1, sp=0x42729fcc, 
request=0x42a5bf74, default_result=6) at modcall.c:211
component = 1
sp = (modsingle *) 0x42729fcc
request = (REQUEST *) 0x42a5bf74
myresult = 1118158708
#2  0x08056b68 in modcall (component=1, c=0x42729fcc, request=0x42a5bf74)
at modcall.c:315
sp = (modsingle *) 0x42a81034
c = (modcallable *) 0x42729fcc
---Type return to continue, or q return to quit---q
Quit
(gdb) print row
$1 = 0x42a81034
(gdb) print *row
$2 = 0x42a81040 XKgM9N6tR3Xw2
(gdb) print row[0]
$3 = 0x42a81040 XKgM9N6tR3Xw2
(gdb) 

-- 
Todd Fries .. [EMAIL PROTECTED]

(last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $)

Penned by Alan DeKok on Mon, Aug 19, 2002 at 02:42:19PM -0400, we have:
| Todd T. Fries [EMAIL PROTECTED] wrote:
|  It seems to happen when the database is doing a hot-backup and is
|  unresponsive/slow for a few (10-15) minutes.
| 
|   If authorization depends on that database, and it goes down for
| 10-15 minutes, then there's not much point in running the server
| during that time.
| 
|   If the MySQL server really does disappear during backups, I'd
| suggest doing something else to keep the RADIUS alive...
| 
| 
|  Mon Aug 19 00:16:47 2002 : Error: rlm_sql:  There are no DB handles to use!
|  Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11)
| 
|   Hmm.. that's an unchecked de-referencing of a NULL pointer
| somewhere.  Without more information, it's hard to know where.
| 
|   Alan DeKok.
| 
| - 
| List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-21 Thread Todd T. Fries

The code path this follows is ..

rlm_sql.c:static int rlm_sql_authorize(void *instance, REQUEST * request) {
[..]
  ret = rlm_sql_fetch_row(sqlsocket, inst);
 sql_mysql.c:int sql_fetch_row(SQLSOCK * sqlsocket, SQL_CONFIG *config) {
rlm_sql_mysql_sock *mysql_sock = sqlsocket-conn;

sqlsocket-row = mysql_fetch_row(mysql_sock-result);

if (sqlsocket-row == NULL) {
return sql_check_error(mysql_errno(mysql_sock-sock));
}
return 0;
 }


   if (ret) {
radlog(L_ERR, rlm_sql_authorize: query failed);
return RLM_MODULE_FAIL;
   }

   row = sqlsocket-row;
   if (row == NULL) {
radlog(L_ERR, rlm_sql_authorize: no rows returned from query (no such 
user));
return RLM_MODULE_OK;
   }

   if (row[0] == NULL) {
radlog(L_ERR, rlm_sql_authorize: row[0] returned NULL.);
return RLM_MODULE_OK;
   }
   if ((passwd_item = pairmake(User-Password,row[0],T_OP_SET)) != NULL)
pairadd(request-config_items,passwd_item);


Now please help me understand if I'm understanding this right.  It would
appear some kindof failure is happening in the mysql_fetch_row, and it is
instead of returning NULL, returning free'ed memory.  At least my research
suggests it SHOULD return NULL on any failure or valid, allocated memory
on success ...

http://www.mysql.com/doc/en/mysql_fetch_row.html

On a side note, perhaps I should release the socket only when the access of
the 'row' pointer is done?  Or perhaps the api should be altered (again) to
pass a pointer array into fetch_row so that the socket can be released without
the potential for over-writing prior results?
-- 
Todd Fries .. [EMAIL PROTECTED]

(last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $)

Penned by Todd T. Fries on Wed, Aug 21, 2002 at 01:54:34PM -0500, we have:
| ..more..
| 
| (gdb) bt full
| #0  rlm_sql_authorize (instance=0x42735fd0, request=0x42a5bf74)
| at rlm_sql.c:492
| check_tmp = (VALUE_PAIR *) 0x0
| reply_tmp = (VALUE_PAIR *) 0x0
| passwd_item = (VALUE_PAIR *) 0x42a81034
| found = 1
| sqlsocket = (SQLSOCK *) 0x427d1fe8
| row = 0x42a81034
| querystr = SELECT Value,Attribute FROM radcheck WHERE UserName = 'toddtest' 
|AND ( Attribute = 'User-Password' OR Attribute = 'Password' OR Attribute = 
|'Crypt-Password' ) ORDER BY Attribute DESC\000ergroup.GroupName...
| ret = 0
| sqlusername = toddtest, '\000' repeats 509 times
| #1  0x080569f0 in call_modsingle (component=1, sp=0x42729fcc, 
| request=0x42a5bf74, default_result=6) at modcall.c:211
| component = 1
| sp = (modsingle *) 0x42729fcc
| request = (REQUEST *) 0x42a5bf74
| myresult = 1118158708
| #2  0x08056b68 in modcall (component=1, c=0x42729fcc, request=0x42a5bf74)
| at modcall.c:315
| sp = (modsingle *) 0x42a81034
| c = (modcallable *) 0x42729fcc
| ---Type return to continue, or q return to quit---q
| Quit
| (gdb) print row
| $1 = 0x42a81034
| (gdb) print *row
| $2 = 0x42a81040 XKgM9N6tR3Xw2
| (gdb) print row[0]
| $3 = 0x42a81040 XKgM9N6tR3Xw2
| (gdb) 
| 
| -- 
| Todd Fries .. [EMAIL PROTECTED]
| 
| (last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $)
| 
| Penned by Alan DeKok on Mon, Aug 19, 2002 at 02:42:19PM -0400, we have:
| | Todd T. Fries [EMAIL PROTECTED] wrote:
| |  It seems to happen when the database is doing a hot-backup and is
| |  unresponsive/slow for a few (10-15) minutes.
| | 
| |   If authorization depends on that database, and it goes down for
| | 10-15 minutes, then there's not much point in running the server
| | during that time.
| | 
| |   If the MySQL server really does disappear during backups, I'd
| | suggest doing something else to keep the RADIUS alive...
| | 
| | 
| |  Mon Aug 19 00:16:47 2002 : Error: rlm_sql:  There are no DB handles to use!
| |  Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11)
| | 
| |   Hmm.. that's an unchecked de-referencing of a NULL pointer
| | somewhere.  Without more information, it's hard to know where.
| | 
| |   Alan DeKok.
| | 
| | - 
| | List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
| 
| - 
| List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-21 Thread dendy

On Wed, Aug 21, 2002 at 02:36:08PM -0500, Todd T. Fries wrote:
 On a side note, perhaps I should release the socket only when the access of
 the 'row' pointer is done?  Or perhaps the api should be altered (again) to
 pass a pointer array into fetch_row so that the socket can be released without
 the potential for over-writing prior results?
This question was discussed at 
http://www.mail-archive.com/freeradius-users@lists.cistron.nl/msg07312.html

-- 
Denis Tatarskikh [UdSU/MF] [UdSU/IC]mailto:[EMAIL PROTECTED]



- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Error: CHILD: exit on signal (11)

2002-08-19 Thread Todd T. Fries

This is a log exerp from a server using freeradius 0.7 authenticating
against mysql .. does anyone have a pointer for where I should start digging?

It seems to happen when the database is doing a hot-backup and is
unresponsive/slow for a few (10-15) minutes.

The 'useful' options from sql.conf should be:

connect_failure_retry_delay = 15
num_sql_socks = 18
sqltrace = yes
sqltracefile = ${logdir}/sqltrace.sql
sql_user_name = %{Stripped-User-Name:-%{User-Name:-none}}
# Uncomment simul_count_query to enable simultaneous use checking
simul_count_query = SELECT COUNT(*) FROM ${acct_table1} WHERE UserName=...
simul_verify_query = SELECT RadAcctId, AcctSessionId, UserName, NASIPAd...
simul_zap_query = DELETE FROM ${acct_table1} WHERE RadAcctId = '%s'


Mon Aug 19 00:16:15 2002 : Error: WARNING: Unresponsive child (id 55299) for
request 4861
Mon Aug 19 00:16:18 2002 : Error: WARNING: Unresponsive child (id 53252) for
request 4862
Mon Aug 19 00:16:18 2002 : Error: WARNING: Unresponsive child (id 54274) for
request 4863
Mon Aug 19 00:16:21 2002 : Error: WARNING: Unresponsive child (id 56325) for
request 4864
Mon Aug 19 00:16:21 2002 : Error: WARNING: Unresponsive child (id 57350) for
request 4865
Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 59400) for
request 4867
Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 60425) for
request 4868
Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 61450) for
request 4869
Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 62475) for
request 4870
Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 63500) for
request 4871
Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 58375) for
request 4866
Mon Aug 19 00:16:44 2002 : Error: rlm_sql:  There are no DB handles to use!
Mon Aug 19 00:16:44 2002 : Error: WARNING: Unresponsive child (id 64525) for
request 4872
Mon Aug 19 00:16:47 2002 : Error: WARNING: Unresponsive child (id 65550) for
request 4873
Mon Aug 19 00:16:47 2002 : Error: rlm_sql:  There are no DB handles to use!
Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11)
Mon Aug 19 06:38:43 2002 : Info: rlm_sql: Driver rlm_sql_mysql loaded and
linked
Mon Aug 19 06:38:43 2002 : Info: rlm_sql: Attempting to connect to
dbuser@dbserver:/db
Mon Aug 19 06:38:43 2002 : Info: rlm_sql: Starting connect to MySQL server
for #0

-- 
Todd Fries .. [EMAIL PROTECTED]

Monte R. Lee and Company

Work   405-842-2405
Fax405-848-8018

Webwww.mrleng.com

$ToddFries: signature.w,v 1.5 2002/05/23 19:48:19 todd Exp $


- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-19 Thread Alan DeKok

Todd T. Fries [EMAIL PROTECTED] wrote:
 It seems to happen when the database is doing a hot-backup and is
 unresponsive/slow for a few (10-15) minutes.

  If authorization depends on that database, and it goes down for
10-15 minutes, then there's not much point in running the server
during that time.

  If the MySQL server really does disappear during backups, I'd
suggest doing something else to keep the RADIUS alive...


 Mon Aug 19 00:16:47 2002 : Error: rlm_sql:  There are no DB handles to use!
 Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11)

  Hmm.. that's an unchecked de-referencing of a NULL pointer
somewhere.  Without more information, it's hard to know where.

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-08-19 Thread Josh Wilsdon

 Todd T. Fries [EMAIL PROTECTED] wrote:
  It seems to happen when the database is doing a hot-backup and is
  unresponsive/slow for a few (10-15) minutes.
 
  Mon Aug 19 00:16:47 2002 : Error: rlm_sql:  There are no DB handles to use!
  Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11)
 
   Hmm.. that's an unchecked de-referencing of a NULL pointer
 somewhere.  Without more information, it's hard to know where.
 

This is something that we have been dealing with as well.  Whenever
the database becomes unavailable, (once it was a switch that died, once 
it was a hard drive, etc..) the radius server segfaults.  Sometimes it
does so immediately, and sometimes when the connection is 
re-established.  This has been the case with version 0.4, 0.5 and 0.6 
and some CVS versions in between.  I never did have enough time to track 
it down, so we've just been running a keepalive script so that when it 
dies, it comes back up.  I could (when doing testing before) reproduce 
this at will by bringing the connection to the database down, and 
sending queries to the radius server.  This was the case with both the 
postgres and oracle modules anyway.

Hope that helps,
  Josh Wilsdon

-- 
Josh Wilsdon [EMAIL PROTECTED] Programmer Analyst
Wizard IT Services - http://www.wizard.ca 
Linux Support Specialist - http://linuxmagic.com
Unix Administration, Website Hosting, Network Services, Programming
(604) 589-0037 Beautiful British Columbia, Canada
LinuxMagic is a TradeMark of Wizard Tower TechnoServices Ltd.

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to which
they are addressed.  If you have received this email in error please
notify the system manager.  Please note that any views or opinions
presented in this email are solely those of the author and do not
necessarily represent those of the company.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Re: Error: CHILD: exit on signal (11)

2002-02-11 Thread Alan DeKok

Eric Dean [EMAIL PROTECTED] wrote:
 Anyboday know of a good way I can debug this so that I can let everyone
 know the source of this problem?

  gdb?  See doc/BUGS

  Alan DeKok.

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Error: CHILD: exit on signal (11)

2002-02-11 Thread Eric Dean


Anyboday know of a good way I can debug this so that I can let everyone
know the source of this problem?


- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html

- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html



Error: CHILD: exit on signal (11)

2002-02-07 Thread Eric Dean


Anyboday know of a good way I can debug this so that I can let everyone
know the source of this problem?


- 
List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html