I posted about this a month or two ago and didn't see any responses. It happened again; I tried to open a bug report (not sure if it worked), and thought I'd try posting again.
After upgrading to 1.9.1, we've noticed that when a full resync is required kadmind behaves unexpectedly, spawning multiple processes to handle the same resync request, and ending up with multiple processes providing kadmin services. I'm not sure if this happens every full resync, but it has occurred multiple times. It starts off with a full resync request: Nov 2 03:49:58 halfy kadmind[8938]: Request: iprop_get_updates_1, UPDATE_FULL_RESYNC_NEEDED; Incoming SerialNo=102280; Outgoing SerialNo=N/A, success, client=kiprop/loogie.unx.csupomona.edu@CSUPOMONA .EDU, service=kiprop/[email protected], addr=134.71.247.11 A child process is spawned to serve that request: Nov 2 03:49:58 halfy kadmind[8938]: Request: iprop_full_resync_1, spawned resync process 20238, client=kiprop/[email protected], service=kiprop/[email protected], addr=134.71.247.11 That process gets a strange error (which I'm not sure is relevant): Nov 2 03:50:06 halfy kadmind[20238]: iprop_full_resync_1: pclose(popen) failed: Success Then, rather than fulfilling the sync request, the child process spawns *another* child process: Nov 2 03:52:56 halfy kadmind[20238]: Request: iprop_get_updates_1, UPDATE_FULL_RESYNC_NEEDED; Incoming SerialNo=102280; Outgoing SerialNo=N/A, success, client=kiprop/[email protected], service=kiprop/[email protected], addr=134.71.247.11 Nov 2 03:52:56 halfy kadmind[20238]: Request: iprop_full_resync_1, spawned resync process 20610, client=kiprop/[email protected], service=kiprop/[email protected], addr=134.71.247.11 There are no messages from that pid, and it seems to actually fulfill the sync request. At this point, *two* separate kadmind processes both seem to be fulfilling kadmin requests: Nov 2 03:52:14 halfy kadmind[20238]: Request: kadm5_get_principal, [email protected], success, [email protected], service=kadmin/[email protected], addr=134.71.247.23 Nov 2 03:52:14 halfy kadmind[8938]: Request: kadm5_modify_principal, [email protected], success, [email protected], service=kadmin/[email protected], addr=134.71.247.23 The last time this happened, multiple generations of children were spawned, and there were half a dozen or so kadmind processes all serving requests. On the kdc client side: Nov 2 03:50:28 loogie kpropd[2911]: /usr/sbin/kpropd: Bad file descriptor while accepting connection Nov 2 03:51:08 loogie kpropd[2911]: /usr/sbin/kpropd: Bad file descriptor while accepting connection Nov 2 03:52:28 loogie kpropd[2911]: /usr/sbin/kpropd: Bad file descriptor while accepting connection Nov 2 03:52:28 loogie kpropd[2911]: kpropd: Full resync, invalid return. Nov 2 03:53:00 loogie kpropd[4221]: Connection from halfy.unx.csupomona.edu kpropd complains about the failures and then works eventually. >From a client perspective, connections to kadmind start flaking out: Nov 2 03:50:07 derp idmgmt[30265]: error storing expiration: Communication failure with server (Kerberos) Nov 2 03:50:14 derp idmgmt[30265]: error storing expiration: Communication failure with server (Kerberos) [...] Nov 2 04:03:13 derp idmgmt[30265]: error getting principal: Communication failure with server (Kerberos) We originally deployed incremental under 1.8, and this never happened. It seems to be something new with 1.9. Any ideas? Thanks... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | [email protected] California State Polytechnic University | Pomona CA 91768 ________________________________________________ Kerberos mailing list [email protected] https://mailman.mit.edu/mailman/listinfo/kerberos
