On Mon, Nov 25, 2013 at 09:23:22AM +1000, Matt Bryant wrote: > All, > > Was wondering if anyone can help out or point us the in right > direction. Ever since we updated from IPA v2.1 to IPA v3.0 have been > seeing some intermittent errors when trying to change passwords etc. > Getting the error cannot change password since system offline. > > Have increased the logging and found that quite frequently the > sasl_bind using the host principle is timing out and failing ... > (whether this is red herring or not dont know but cant be good > anyhow) > > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_kinit_send] (0x0400): Attempting kinit (default, > host/tardis.ipa.server-noc.com, SERVER-IPA, 86400) > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_kinit_next_kdc] (0x1000): Resolving next KDC for service IPA > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [fo_resolve_service_send] (0x0100): Trying to resolve service 'IPA' > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [get_server_status] (0x1000): Status of server > 'tardis.ipa.server-noc.com' is 'working' > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set > to 10 seconds > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [get_server_status] (0x1000): Status of server > 'tardis.ipa.server-noc.com' is 'working' > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [be_resolve_server_process] (0x1000): Saving the first resolved > server > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [be_resolve_server_process] (0x0200): Found address for server > tardis.ipa.server-noc.com: [203.147.190.30] TTL 7200 > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_kinit_kdc_resolved] (0x1000): KDC resolved, attempting to get > TGT... > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [create_tgt_req_send_buffer] (0x1000): buffer size: 56 > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [child_handler_setup] (0x2000): Setting up signal handler up for pid > [3734] > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [child_handler_setup] (0x2000): Signal handler set up for pid [3734] > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [set_tgt_child_timeout] (0x0400): Setting 6 seconds timeout for tgt > child > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_process_result] (0x2000): Trace: sh[0xa45960], connected[1], > ops[(nil)], ldap[0xa12200] > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_process_result] (0x2000): Trace: ldap_result found nothing! > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [write_pipe_handler] (0x0400): All data has been sent! > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [child_sig_handler] (0x1000): Waiting for child [3734]. > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [child_sig_handler] (0x0100): child [3734] finished successfully. > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sss_child_handler] (0x2000): waitpid failed [10]: No child > processes > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [read_pipe_handler] (0x0400): EOF received, client finished > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_get_tgt_recv] (0x0400): Child responded: 0 > [FILE:/var/lib/sss/db/ccache_SERVER-IPA], expired on [1385420045] > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_cli_auth_step] (0x0100): expire timeout is 900 > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_cli_auth_step] (0x1000): the connection will expire at > 1385334545 > (Mon Nov 25 08:54:05 2013) [sssd[be[ipa.server-noc.com]]] > [sasl_bind_send] (0x0100): Executing sasl bind mech: GSSAPI, user: > host/tardis.ipa.server-noc.com > (Mon Nov 25 08:54:11 2013) [sssd[be[ipa.server-noc.com]]] > [sasl_bind_send] (0x0020): ldap_sasl_bind failed (-5)[Timed out] > (Mon Nov 25 08:54:11 2013) [sssd[be[ipa.server-noc.com]]] > [sasl_bind_send] (0x0080): Extended failure message: [unknown error] > (Mon Nov 25 08:54:11 2013) [sssd[be[ipa.server-noc.com]]] > [fo_set_port_status] (0x0100): Marking port 0 of server > 'tardis.ipa.server-noc.com' as 'not working' > (Mon Nov 25 08:54:11 2013) [sssd[be[ipa.server-noc.com]]] > [sdap_handle_release] (0x2000): Trace: sh[0xa45960], connected[1], > ops[(nil)], ldap[0xa12200], destructor_lock[0], release_memory[0] > .. > .. > > it then goes on to connect to fail over server and succeed and > shortly after the master server is tested and marked as working > again ... works for a couple of times then time out again .. any > pointers gratefully received.
According to the logs I would say that this timeout happens on the LDAP server side. Do you have a chance to check the server logs to see what happens at this time on the server? You can increase the timeout value on the client by setting ldap_opt_timeout to a larger value then 6 (the default). But please note that the operation might take longer then letting SSSD fail over to the next server. HTH bye, Sumit > > > packages used ... > > ipa-server-3.0.0-25.el6.x86_64 > sssd-1.9.2-82.11.el6_4.x86_64 > > rgds > > Matt Bryant > > _______________________________________________ > Freeipa-users mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/freeipa-users _______________________________________________ Freeipa-users mailing list [email protected] https://www.redhat.com/mailman/listinfo/freeipa-users
