Peter Shoults <Peter.Shoults at Sun.COM> writes: > I am sending this out again as the customer has tested a fix I provided > and determined that it does resolve their issue. I would like to move > forward with a fix, and the one I am using now is the one mentioned > below with the addition of signal(). If I can get some comments, I > would appreciate it. Otherwise - I guess I will just proceed to put > this into Solaris code. > > Pete
Is there a reason to not use sigaction() here if possible? That should make things more portable than calling signal(). > On 09/16/09 10:48, Peter Shoults wrote: >> Hi, >> >> Customer has brought forward an issue they were having with Kerberos and >> LDAP, where LDAP is being used to store the database information for >> Kerberos. The issue is that if the LDAP server is restarted for any >> reason, then Kerberos does not automatically resync back with the LDAP >> server when the LDAP server is back up and running. Specifically, one >> can run and login into kadmin, but any commands that are run will fail >> with the error: >> >> "Communication failure with server while retrieving list." >> >> It turns out if the user exits from kadmin and logs back in a second >> time, then the command do work fine. >> >> I have determined that the cause of this problem is that when the LDAP >> server is restarted, all the connections we have on port 636 to the LDAP >> server go into a CLOSE_WAIT/FIN_WAIT_2 state. When we log into kadmin, >> we attempt to contact the LDAP server on these connections, and we >> received SIGPIPE in response to our writes. Here is a snippet from truss: >> >> 3200/1: 57.2401 write(14, 0x0010B810, 23) >> Err#32 EPIPE >> 3200/1: 150301\012941A 60F Y P87A7BE9318B6 >> c8C |0F v >> 3200/1: 57.2404 Received signal #13, SIGPIPE [caught] >> >> This is fine - the sig_pipe handler is invoked and we do print out the >> syslog message. However, we never reset the signal disposition for >> SIGPIPE. kadmind process immediately proceeds to try the next >> connection to the LDAP server, and again gets SIGPIPE. This time >> though, the default handler is invoked, which terminates kadmind. At >> this point, SMF realizes kadmind has died and restarts it, which >> re-establishes all our connections to the LDAP server and that explains >> why a subsequent login to kadmin will work. >> >> I have two questions about this. The first why do we have a handler for >> SIGPIPE in the kadmin code, unlike the krb5kdc code, which sets SIGPIPE >> disposition to SIG_IGNORE. This handler in the kadmin code has not >> changed in a long long time. I tested setting SIGPIPE to SIG_IGN and >> this does allow a user to enter commands into kadmin after LDAP server >> restarts and run commands without issue. >> >> Assuming we have the SIGPIPE handler specifically to output the syslog >> message, then I propose that we have in the handler a resetting of the >> signal disposition to sig_pipe. I have also tested this fix and >> verified that this also resolves the problem and allows the user to >> enter kadmin commands after LDAP server restarts. Here is my change: >> >> file modified is ovsec_kadmd.c >> >> void >> sig_pipe(int unused) >> { >> + signal(SIGPIPE, sig_pipe); >> krb5_klog_syslog(LOG_NOTICE, gettext("Warning: Received a SIGPIPE; " >> "probably a client aborted. Continuing.")); >> } >> >> >> Pete >> >> > > _______________________________________________ > kerberos-discuss mailing list > kerberos-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/kerberos-discuss