Hi, Customer has brought forward an issue they were having with Kerberos and LDAP, where LDAP is being used to store the database information for Kerberos. The issue is that if the LDAP server is restarted for any reason, then Kerberos does not automatically resync back with the LDAP server when the LDAP server is back up and running. Specifically, one can run and login into kadmin, but any commands that are run will fail with the error:
"Communication failure with server while retrieving list." It turns out if the user exits from kadmin and logs back in a second time, then the command do work fine. I have determined that the cause of this problem is that when the LDAP server is restarted, all the connections we have on port 636 to the LDAP server go into a CLOSE_WAIT/FIN_WAIT_2 state. When we log into kadmin, we attempt to contact the LDAP server on these connections, and we received SIGPIPE in response to our writes. Here is a snippet from truss: 3200/1: 57.2401 write(14, 0x0010B810, 23) Err#32 EPIPE 3200/1: 150301\012941A 60F Y P87A7BE9318B6 c8C |0F v 3200/1: 57.2404 Received signal #13, SIGPIPE [caught] This is fine - the sig_pipe handler is invoked and we do print out the syslog message. However, we never reset the signal disposition for SIGPIPE. kadmind process immediately proceeds to try the next connection to the LDAP server, and again gets SIGPIPE. This time though, the default handler is invoked, which terminates kadmind. At this point, SMF realizes kadmind has died and restarts it, which re-establishes all our connections to the LDAP server and that explains why a subsequent login to kadmin will work. I have two questions about this. The first why do we have a handler for SIGPIPE in the kadmin code, unlike the krb5kdc code, which sets SIGPIPE disposition to SIG_IGNORE. This handler in the kadmin code has not changed in a long long time. I tested setting SIGPIPE to SIG_IGN and this does allow a user to enter commands into kadmin after LDAP server restarts and run commands without issue. Assuming we have the SIGPIPE handler specifically to output the syslog message, then I propose that we have in the handler a resetting of the signal disposition to sig_pipe. I have also tested this fix and verified that this also resolves the problem and allows the user to enter kadmin commands after LDAP server restarts. Here is my change: file modified is ovsec_kadmd.c void sig_pipe(int unused) { + signal(SIGPIPE, sig_pipe); krb5_klog_syslog(LOG_NOTICE, gettext("Warning: Received a SIGPIPE; " "probably a client aborted. Continuing.")); } Pete