On 09/25/09 12:50, Tom Yu wrote: > Peter Shoults <Peter.Shoults at Sun.COM> writes: > > >> I am sending this out again as the customer has tested a fix I provided >> and determined that it does resolve their issue. I would like to move >> forward with a fix, and the one I am using now is the one mentioned >> below with the addition of signal(). If I can get some comments, I >> would appreciate it. Otherwise - I guess I will just proceed to put >> this into Solaris code. >> >> Pete >> > > Is there a reason to not use sigaction() here if possible? That > should make things more portable than calling signal(). > > I used signal() because that is what is being used in this file as it exists now, so I did it to maintain a standard look/feel. >> On 09/16/09 10:48, Peter Shoults wrote: >> >>> Hi, >>> >>> Customer has brought forward an issue they were having with Kerberos and >>> LDAP, where LDAP is being used to store the database information for >>> Kerberos. The issue is that if the LDAP server is restarted for any >>> reason, then Kerberos does not automatically resync back with the LDAP >>> server when the LDAP server is back up and running. Specifically, one >>> can run and login into kadmin, but any commands that are run will fail >>> with the error: >>> >>> "Communication failure with server while retrieving list." >>> >>> It turns out if the user exits from kadmin and logs back in a second >>> time, then the command do work fine. >>> >>> I have determined that the cause of this problem is that when the LDAP >>> server is restarted, all the connections we have on port 636 to the LDAP >>> server go into a CLOSE_WAIT/FIN_WAIT_2 state. When we log into kadmin, >>> we attempt to contact the LDAP server on these connections, and we >>> received SIGPIPE in response to our writes. Here is a snippet from truss: >>> >>> 3200/1: 57.2401 write(14, 0x0010B810, 23) >>> Err#32 EPIPE >>> 3200/1: 150301\012941A 60F Y P87A7BE9318B6 >>> c8C |0F v >>> 3200/1: 57.2404 Received signal #13, SIGPIPE [caught] >>> >>> This is fine - the sig_pipe handler is invoked and we do print out the >>> syslog message. However, we never reset the signal disposition for >>> SIGPIPE. kadmind process immediately proceeds to try the next >>> connection to the LDAP server, and again gets SIGPIPE. This time >>> though, the default handler is invoked, which terminates kadmind. At >>> this point, SMF realizes kadmind has died and restarts it, which >>> re-establishes all our connections to the LDAP server and that explains >>> why a subsequent login to kadmin will work. >>> >>> I have two questions about this. The first why do we have a handler for >>> SIGPIPE in the kadmin code, unlike the krb5kdc code, which sets SIGPIPE >>> disposition to SIG_IGNORE. This handler in the kadmin code has not >>> changed in a long long time. I tested setting SIGPIPE to SIG_IGN and >>> this does allow a user to enter commands into kadmin after LDAP server >>> restarts and run commands without issue. >>> >>> Assuming we have the SIGPIPE handler specifically to output the syslog >>> message, then I propose that we have in the handler a resetting of the >>> signal disposition to sig_pipe. I have also tested this fix and >>> verified that this also resolves the problem and allows the user to >>> enter kadmin commands after LDAP server restarts. Here is my change: >>> >>> file modified is ovsec_kadmd.c >>> >>> void >>> sig_pipe(int unused) >>> { >>> + signal(SIGPIPE, sig_pipe); >>> krb5_klog_syslog(LOG_NOTICE, gettext("Warning: Received a SIGPIPE; " >>> "probably a client aborted. Continuing.")); >>> } >>> >>> >>> Pete >>> >>> >>> >> _______________________________________________ >> kerberos-discuss mailing list >> kerberos-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/kerberos-discuss >>
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/kerberos-discuss/attachments/20090925/f33e9dec/attachment.html>