Re: Error: CHILD: exit on signal (11)
Just got back from a 2 week sabatical, hoping to pick up where I left off. John [EMAIL PROTECTED] wrote: After running flawlessly for a couple of weeks, suddenly and inexplicably, the radius server started spawning process and reached the maximum default of 32 (continued running), complained about unresponsive child processes, and then died with signal 11. |That's most likely due to a back-end database locking, or a bug in |the server. I would suggest upgrading to 0.7, as it has more bug fixes. Also, |ensure that you've deleted all old 'rlm' modules from the system. The version I am running is 0.7 (I upgraded to .7 from .6 originally before writing into the list). However, I wasn't sure if I had deleted the rlm modules, so I did that yesterday (actually, I did a fresh install), and the problem still persists. I looked through the cvs logs and have not seen any work done to rlm_ldap, or at least nothing as far as bug fixes since 0.7. Reading through the other replies, the symptons are very similiar to the ones seen by Todd Fries in: http://lists.cistron.nl/archives/freeradius-users/2002/08/frm01266.html with the sql module. Any thoughts? -- John Hogenmiller, kb3dfz Systems Administrator, Pennswoods.net 877.716.2002 ext 529 --- Chris then consulted his Friend *snip*, a fellow co worker and he to then thought of making this a success. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
On Tue, 27 Aug 2002, John wrote: Just got back from a 2 week sabatical, hoping to pick up where I left off. John [EMAIL PROTECTED] wrote: After running flawlessly for a couple of weeks, suddenly and inexplicably, the radius server started spawning process and reached the maximum default of 32 (continued running), complained about unresponsive child processes, and then died with signal 11. |That's most likely due to a back-end database locking, or a bug in |the server. I would suggest upgrading to 0.7, as it has more bug fixes. Also, |ensure that you've deleted all old 'rlm' modules from the system. The version I am running is 0.7 (I upgraded to .7 from .6 originally before writing into the list). However, I wasn't sure if I had deleted the rlm modules, so I did that yesterday (actually, I did a fresh install), and the problem still persists. I looked through the cvs logs and have not seen any work done to rlm_ldap, or at least nothing as far as bug fixes since 0.7. Reading through the other replies, the symptons are very similiar to the ones seen by Todd Fries in: http://lists.cistron.nl/archives/freeradius-users/2002/08/frm01266.html with the sql module. Any thoughts? The ldap module should be able to tolerate bad ldap servers. If anything goes wrong again post the radius logs and try to find a core dump and do a backtrace (allow_core_dumps = yes directive in radiusd.conf) Also make sure that max_request_time is quite larger than the timeouts defined in the ldap module configuration. Probably max_request_time = net_timeout + timeout + 10 -- Kostas Kalevras Network Operations Center [EMAIL PROTECTED] National Technical University of Athens, Greece Work Phone: +30 10 7721861 'Go back to the shadow' Gandalf - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
oof. Sorry. Thanks. -- Todd Fries .. [EMAIL PROTECTED] (last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $) Penned by [EMAIL PROTECTED] on Thu, Aug 22, 2002 at 09:17:44AM +0500, we have: | On Wed, Aug 21, 2002 at 02:36:08PM -0500, Todd T. Fries wrote: | On a side note, perhaps I should release the socket only when the access of | the 'row' pointer is done? Or perhaps the api should be altered (again) to | pass a pointer array into fetch_row so that the socket can be released without | the potential for over-writing prior results? | This question was discussed at | http://www.mail-archive.com/freeradius-users@lists.cistron.nl/msg07312.html | | -- | Denis Tatarskikh [UdSU/MF] [UdSU/IC]mailto:[EMAIL PROTECTED] | | | | - | List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
[EMAIL PROTECTED] wrote: On Wed, Aug 21, 2002 at 02:36:08PM -0500, Todd T. Fries wrote: On a side note, perhaps I should release the socket only when the access of the 'row' pointer is done? Or perhaps the api should be altered (again) to pass a pointer array into fetch_row so that the socket can be released without the potential for over-writing prior results? This question was discussed at http://www.mail-archive.com/freeradius-users@lists.cistron.nl/msg07312.html I'll add: http://www.freeradius.org/cvs-log/2002-07-26.08:00:02.html Look for the word 'dendy' Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
I've delved into this. The perl script that is 'mysqlhotbackup' does the following: .. and thus it locks it (from my experience) for read only. So I'm testing with: while :; do sleep $[ $RANDOM / 8192 ]; rcmysql stop; rcmysql start; done This tends to tickle things. I've run 'gdb ./radiusd' then '(gdb) run -f' and I am at the following prompt: [New Thread 31776 (LWP 27266)] Delayed SIGSTOP caught for LWP 27266. LWP 26565 exited. Cannot find thread 33: invalid thread handle (gdb) tr trace command requires an argument (gdb) bt #0 0x401c0931 in __linuxthreads_create_event () from /lib/libpthread.so.0 #1 0x401ba3bb in pthread_handle_create () from /lib/libpthread.so.0 #2 0x401b9d2d in __pthread_manager () from /lib/libpthread.so.0 #3 0x401b9e3d in __pthread_manager_event () from /lib/libpthread.so.0 (gdb) This doesn't look very familiar to me, perhaps I am in some other thread's context and not the place where the problem occurred? -- Todd Fries .. [EMAIL PROTECTED] (last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $) Penned by Alan DeKok on Mon, Aug 19, 2002 at 02:42:19PM -0400, we have: | Todd T. Fries [EMAIL PROTECTED] wrote: | It seems to happen when the database is doing a hot-backup and is | unresponsive/slow for a few (10-15) minutes. | | If authorization depends on that database, and it goes down for | 10-15 minutes, then there's not much point in running the server | during that time. | | If the MySQL server really does disappear during backups, I'd | suggest doing something else to keep the RADIUS alive... | | | Mon Aug 19 00:16:47 2002 : Error: rlm_sql: There are no DB handles to use! | Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11) | | Hmm.. that's an unchecked de-referencing of a NULL pointer | somewhere. Without more information, it's hard to know where. | | Alan DeKok. | | - | List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
..more.. (gdb) bt full #0 rlm_sql_authorize (instance=0x42735fd0, request=0x42a5bf74) at rlm_sql.c:492 check_tmp = (VALUE_PAIR *) 0x0 reply_tmp = (VALUE_PAIR *) 0x0 passwd_item = (VALUE_PAIR *) 0x42a81034 found = 1 sqlsocket = (SQLSOCK *) 0x427d1fe8 row = 0x42a81034 querystr = SELECT Value,Attribute FROM radcheck WHERE UserName = 'toddtest' AND ( Attribute = 'User-Password' OR Attribute = 'Password' OR Attribute = 'Crypt-Password' ) ORDER BY Attribute DESC\000ergroup.GroupName... ret = 0 sqlusername = toddtest, '\000' repeats 509 times #1 0x080569f0 in call_modsingle (component=1, sp=0x42729fcc, request=0x42a5bf74, default_result=6) at modcall.c:211 component = 1 sp = (modsingle *) 0x42729fcc request = (REQUEST *) 0x42a5bf74 myresult = 1118158708 #2 0x08056b68 in modcall (component=1, c=0x42729fcc, request=0x42a5bf74) at modcall.c:315 sp = (modsingle *) 0x42a81034 c = (modcallable *) 0x42729fcc ---Type return to continue, or q return to quit---q Quit (gdb) print row $1 = 0x42a81034 (gdb) print *row $2 = 0x42a81040 XKgM9N6tR3Xw2 (gdb) print row[0] $3 = 0x42a81040 XKgM9N6tR3Xw2 (gdb) -- Todd Fries .. [EMAIL PROTECTED] (last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $) Penned by Alan DeKok on Mon, Aug 19, 2002 at 02:42:19PM -0400, we have: | Todd T. Fries [EMAIL PROTECTED] wrote: | It seems to happen when the database is doing a hot-backup and is | unresponsive/slow for a few (10-15) minutes. | | If authorization depends on that database, and it goes down for | 10-15 minutes, then there's not much point in running the server | during that time. | | If the MySQL server really does disappear during backups, I'd | suggest doing something else to keep the RADIUS alive... | | | Mon Aug 19 00:16:47 2002 : Error: rlm_sql: There are no DB handles to use! | Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11) | | Hmm.. that's an unchecked de-referencing of a NULL pointer | somewhere. Without more information, it's hard to know where. | | Alan DeKok. | | - | List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
The code path this follows is .. rlm_sql.c:static int rlm_sql_authorize(void *instance, REQUEST * request) { [..] ret = rlm_sql_fetch_row(sqlsocket, inst); sql_mysql.c:int sql_fetch_row(SQLSOCK * sqlsocket, SQL_CONFIG *config) { rlm_sql_mysql_sock *mysql_sock = sqlsocket-conn; sqlsocket-row = mysql_fetch_row(mysql_sock-result); if (sqlsocket-row == NULL) { return sql_check_error(mysql_errno(mysql_sock-sock)); } return 0; } if (ret) { radlog(L_ERR, rlm_sql_authorize: query failed); return RLM_MODULE_FAIL; } row = sqlsocket-row; if (row == NULL) { radlog(L_ERR, rlm_sql_authorize: no rows returned from query (no such user)); return RLM_MODULE_OK; } if (row[0] == NULL) { radlog(L_ERR, rlm_sql_authorize: row[0] returned NULL.); return RLM_MODULE_OK; } if ((passwd_item = pairmake(User-Password,row[0],T_OP_SET)) != NULL) pairadd(request-config_items,passwd_item); Now please help me understand if I'm understanding this right. It would appear some kindof failure is happening in the mysql_fetch_row, and it is instead of returning NULL, returning free'ed memory. At least my research suggests it SHOULD return NULL on any failure or valid, allocated memory on success ... http://www.mysql.com/doc/en/mysql_fetch_row.html On a side note, perhaps I should release the socket only when the access of the 'row' pointer is done? Or perhaps the api should be altered (again) to pass a pointer array into fetch_row so that the socket can be released without the potential for over-writing prior results? -- Todd Fries .. [EMAIL PROTECTED] (last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $) Penned by Todd T. Fries on Wed, Aug 21, 2002 at 01:54:34PM -0500, we have: | ..more.. | | (gdb) bt full | #0 rlm_sql_authorize (instance=0x42735fd0, request=0x42a5bf74) | at rlm_sql.c:492 | check_tmp = (VALUE_PAIR *) 0x0 | reply_tmp = (VALUE_PAIR *) 0x0 | passwd_item = (VALUE_PAIR *) 0x42a81034 | found = 1 | sqlsocket = (SQLSOCK *) 0x427d1fe8 | row = 0x42a81034 | querystr = SELECT Value,Attribute FROM radcheck WHERE UserName = 'toddtest' |AND ( Attribute = 'User-Password' OR Attribute = 'Password' OR Attribute = |'Crypt-Password' ) ORDER BY Attribute DESC\000ergroup.GroupName... | ret = 0 | sqlusername = toddtest, '\000' repeats 509 times | #1 0x080569f0 in call_modsingle (component=1, sp=0x42729fcc, | request=0x42a5bf74, default_result=6) at modcall.c:211 | component = 1 | sp = (modsingle *) 0x42729fcc | request = (REQUEST *) 0x42a5bf74 | myresult = 1118158708 | #2 0x08056b68 in modcall (component=1, c=0x42729fcc, request=0x42a5bf74) | at modcall.c:315 | sp = (modsingle *) 0x42a81034 | c = (modcallable *) 0x42729fcc | ---Type return to continue, or q return to quit---q | Quit | (gdb) print row | $1 = 0x42a81034 | (gdb) print *row | $2 = 0x42a81040 XKgM9N6tR3Xw2 | (gdb) print row[0] | $3 = 0x42a81040 XKgM9N6tR3Xw2 | (gdb) | | -- | Todd Fries .. [EMAIL PROTECTED] | | (last updated $ToddFries: signature.p,v 1.2 2002/03/19 15:10:18 todd Exp $) | | Penned by Alan DeKok on Mon, Aug 19, 2002 at 02:42:19PM -0400, we have: | | Todd T. Fries [EMAIL PROTECTED] wrote: | | It seems to happen when the database is doing a hot-backup and is | | unresponsive/slow for a few (10-15) minutes. | | | | If authorization depends on that database, and it goes down for | | 10-15 minutes, then there's not much point in running the server | | during that time. | | | | If the MySQL server really does disappear during backups, I'd | | suggest doing something else to keep the RADIUS alive... | | | | | | Mon Aug 19 00:16:47 2002 : Error: rlm_sql: There are no DB handles to use! | | Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11) | | | | Hmm.. that's an unchecked de-referencing of a NULL pointer | | somewhere. Without more information, it's hard to know where. | | | | Alan DeKok. | | | | - | | List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html | | - | List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
On Wed, Aug 21, 2002 at 02:36:08PM -0500, Todd T. Fries wrote: On a side note, perhaps I should release the socket only when the access of the 'row' pointer is done? Or perhaps the api should be altered (again) to pass a pointer array into fetch_row so that the socket can be released without the potential for over-writing prior results? This question was discussed at http://www.mail-archive.com/freeradius-users@lists.cistron.nl/msg07312.html -- Denis Tatarskikh [UdSU/MF] [UdSU/IC]mailto:[EMAIL PROTECTED] - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Error: CHILD: exit on signal (11)
This is a log exerp from a server using freeradius 0.7 authenticating against mysql .. does anyone have a pointer for where I should start digging? It seems to happen when the database is doing a hot-backup and is unresponsive/slow for a few (10-15) minutes. The 'useful' options from sql.conf should be: connect_failure_retry_delay = 15 num_sql_socks = 18 sqltrace = yes sqltracefile = ${logdir}/sqltrace.sql sql_user_name = %{Stripped-User-Name:-%{User-Name:-none}} # Uncomment simul_count_query to enable simultaneous use checking simul_count_query = SELECT COUNT(*) FROM ${acct_table1} WHERE UserName=... simul_verify_query = SELECT RadAcctId, AcctSessionId, UserName, NASIPAd... simul_zap_query = DELETE FROM ${acct_table1} WHERE RadAcctId = '%s' Mon Aug 19 00:16:15 2002 : Error: WARNING: Unresponsive child (id 55299) for request 4861 Mon Aug 19 00:16:18 2002 : Error: WARNING: Unresponsive child (id 53252) for request 4862 Mon Aug 19 00:16:18 2002 : Error: WARNING: Unresponsive child (id 54274) for request 4863 Mon Aug 19 00:16:21 2002 : Error: WARNING: Unresponsive child (id 56325) for request 4864 Mon Aug 19 00:16:21 2002 : Error: WARNING: Unresponsive child (id 57350) for request 4865 Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 59400) for request 4867 Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 60425) for request 4868 Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 61450) for request 4869 Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 62475) for request 4870 Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 63500) for request 4871 Mon Aug 19 00:16:41 2002 : Error: WARNING: Unresponsive child (id 58375) for request 4866 Mon Aug 19 00:16:44 2002 : Error: rlm_sql: There are no DB handles to use! Mon Aug 19 00:16:44 2002 : Error: WARNING: Unresponsive child (id 64525) for request 4872 Mon Aug 19 00:16:47 2002 : Error: WARNING: Unresponsive child (id 65550) for request 4873 Mon Aug 19 00:16:47 2002 : Error: rlm_sql: There are no DB handles to use! Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11) Mon Aug 19 06:38:43 2002 : Info: rlm_sql: Driver rlm_sql_mysql loaded and linked Mon Aug 19 06:38:43 2002 : Info: rlm_sql: Attempting to connect to dbuser@dbserver:/db Mon Aug 19 06:38:43 2002 : Info: rlm_sql: Starting connect to MySQL server for #0 -- Todd Fries .. [EMAIL PROTECTED] Monte R. Lee and Company Work 405-842-2405 Fax405-848-8018 Webwww.mrleng.com $ToddFries: signature.w,v 1.5 2002/05/23 19:48:19 todd Exp $ - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
Todd T. Fries [EMAIL PROTECTED] wrote: It seems to happen when the database is doing a hot-backup and is unresponsive/slow for a few (10-15) minutes. If authorization depends on that database, and it goes down for 10-15 minutes, then there's not much point in running the server during that time. If the MySQL server really does disappear during backups, I'd suggest doing something else to keep the RADIUS alive... Mon Aug 19 00:16:47 2002 : Error: rlm_sql: There are no DB handles to use! Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11) Hmm.. that's an unchecked de-referencing of a NULL pointer somewhere. Without more information, it's hard to know where. Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
Todd T. Fries [EMAIL PROTECTED] wrote: It seems to happen when the database is doing a hot-backup and is unresponsive/slow for a few (10-15) minutes. Mon Aug 19 00:16:47 2002 : Error: rlm_sql: There are no DB handles to use! Mon Aug 19 00:17:37 2002 : Error: CHILD: exit on signal (11) Hmm.. that's an unchecked de-referencing of a NULL pointer somewhere. Without more information, it's hard to know where. This is something that we have been dealing with as well. Whenever the database becomes unavailable, (once it was a switch that died, once it was a hard drive, etc..) the radius server segfaults. Sometimes it does so immediately, and sometimes when the connection is re-established. This has been the case with version 0.4, 0.5 and 0.6 and some CVS versions in between. I never did have enough time to track it down, so we've just been running a keepalive script so that when it dies, it comes back up. I could (when doing testing before) reproduce this at will by bringing the connection to the database down, and sending queries to the radius server. This was the case with both the postgres and oracle modules anyway. Hope that helps, Josh Wilsdon -- Josh Wilsdon [EMAIL PROTECTED] Programmer Analyst Wizard IT Services - http://www.wizard.ca Linux Support Specialist - http://linuxmagic.com Unix Administration, Website Hosting, Network Services, Programming (604) 589-0037 Beautiful British Columbia, Canada LinuxMagic is a TradeMark of Wizard Tower TechnoServices Ltd. This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you have received this email in error please notify the system manager. Please note that any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Re: Error: CHILD: exit on signal (11)
Eric Dean [EMAIL PROTECTED] wrote: Anyboday know of a good way I can debug this so that I can let everyone know the source of this problem? gdb? See doc/BUGS Alan DeKok. - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Error: CHILD: exit on signal (11)
Anyboday know of a good way I can debug this so that I can let everyone know the source of this problem? - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html
Error: CHILD: exit on signal (11)
Anyboday know of a good way I can debug this so that I can let everyone know the source of this problem? - List info/subscribe/unsubscribe? See http://www.freeradius.org/list/users.html