On Thu, Aug 24, 2017 at 10:12:55AM +0200, Michael Gusek via FreeIPA-users wrote:
> Hello Jakub,
> 
> here the first lines of ldap_child.log
> 
> |(Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x0400):
> ldap_child started. (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [main] (0x2000): context initialized (Wed Aug
> 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000):
> total buffer size: 81 (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): realm_str size: 16
> (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer]
> (0x1000): got realm_str: IPA.EXAMPLE.COM (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): princ_str size: 41
> (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer]
> (0x1000): got princ_str: host/ipa-lx-test-debian9.ípa.example.com (Wed
> Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer]
> (0x1000): keytab_name size: 0 (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [unpack_buffer] (0x1000): lifetime: 86400
> (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [unpack_buffer]
> (0x0200): Will run as [0][0]. (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [privileged_krb5_setup] (0x2000): Kerberos
> context initialized (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [main] (0x2000): Kerberos context initialized
> (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [become_user]
> (0x0200): Trying to become user [0][0]. (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [become_user] (0x0200): Already user [0].
> (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]] [main] (0x2000):
> Running as [0][0]. (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]]
> [main] (0x2000): getting TGT sync (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x2000): got
> realm_name: [IPA.EXAMPLE.COM] (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x0100): Principal
> name is: [host/ipa-lx-test-debian9.ípa.example....@ipa.example.com] (Wed
> Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]]
> [ldap_child_get_tgt_sync] (0x0100): Using keytab
> [MEMORY:/etc/krb5.keytab] (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [ldap_child_get_tgt_sync] (0x0100): Will
> canonicalize principals (Wed Aug 23 16:07:11 2017)
> [[sssd[ldap_child[2104]]]] [sss_child_krb5_trace_cb] (0x4000): [2104]
> 1503497231.122092: Getting initial credentials for
> host/ipa-lx-test-debian9.ípa.example....@ipa.example.com (Wed Aug 23
> 16:07:11 2017) [[sssd[ldap_child[2104]]]] [sss_child_krb5_trace_cb]
> (0x4000): [2104] 1503497231.122313: Looked up etypes in keytab:
> aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1,
> rc4-hmac (Wed Aug 23 16:07:11 2017) [[sssd[ldap_child[2104]]]]
> [sss_child_krb5_trace_cb] (0x4000): [2104] 1503497231.122451: Sending
> request (218 bytes) to IPA.EXAMPLE.COM (Wed Aug 23 16:07:17 2017)
> [[sssd[ldap_child[2104]]]] [sig_term_handler] (0x0010): Received signal
> [Terminated] [15], shutting down (Wed Aug 23 16:07:17 2017)
> [[sssd[ldap_child[2104]]]] [sig_term_handler] (0x0010): Unlink file
> [/var/lib/sss/db/ccache_IPA.EXAMPLE.COM_TmKHkD] (Wed Aug 23 16:07:17
> 2017) [[sssd[ldap_child[2105]]]] [main] (0x0400): ldap_child started.
> (Wed Aug 23 16:07:17 2017) [[sssd[ldap_child[2105]]]] [main] (0x2000):
> context initialized |
> 
> We connect to IPA.EXAMPLE.COM, this is not helpfull. You can see, there
> is a delay of 5 seconds. Later in this file, you can see, we try to
> connect to second server ipa-lx-test-02.ípa.example.com.

Right, sssd says it does, but I really wonder why the ldap_child
timeouts even in that case. Are there any log entries in the
ipa-lx-test-02.ípa.example.com's log files around the time sssd connects
to it?

And also -- could you run a simple tcpdump (tcpdump -i eth1 -x "port
88 or port 53") to see what hosts does sssd talk to and what hosts does
it discover?

What I'm wondering is -- does SSSD really talk to the right server at
that time or does it keep trying the bad one even if it should be trying
the one that is up?

By the way, I'm suprised that the failover doesn't work for you. This is a
quite basic feature that had been developed years ago. Can you also specify
exactly how you bring one of the IDM servers down? Do you power it off,
run ipactl stop, ...?

> 
> |(Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]]
> [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.357486: Getting
> initial credentials for
> host/ipa-lx-test-debian9.ípa.example....@ipa.example.com (Wed Aug 23
> 16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb]
> (0x4000): [2129] 1503497414.357715: Looked up etypes in keytab:
> aes256-cts, aes128-cts, aes256-sha2, aes128-sha2, des3-cbc-sha1,
> rc4-hmac (Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]]
> [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.357842: Sending
> request (218 bytes) to IPA.EXAMPLE.COM (Wed Aug 23 16:10:14 2017)
> [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129]
> 1503497414.360574: Resolving hostname ipa-lx-test-02.ípa.example.com.
> (Wed Aug 23 16:10:14 2017) [[sssd[ldap_child[2129]]]]
> [sss_child_krb5_trace_cb] (0x4000): [2129] 1503497414.362141: Sending
> initial UDP request to dgram <ip_of_ipa-lx-test-02>:88 (Wed Aug 23
> 16:10:14 2017) [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb]
> (0x4000): [2129] 1503497414.367725: Received answer (346 bytes) from
> dgram <ip_of_ipa-lx-test-02>:88 (Wed Aug 23 16:10:19 2017)
> [[sssd[ldap_child[2129]]]] [sss_child_krb5_trace_cb] (0x4000): [2129]
> 1503497419.374601: Response was from master KDC |
> 
> I think this was my test with ‘kinit ad...@ipa.example.com’. You can
> find whole log file attached. From my point of view, failover to second
> KDC is not fast enough and we should set some timeouts in sssd.conf or
> krb5.conf ?

Well, kinit doesn't talk to sssd..

sssd only has interfaces towards the NSS and PAM subsystems, so kinit
talks straight to libkrb5. What /KDCs/ does kinit talk to depends on
libkrb5 configuration. In your case, it doesn't contain any KDCs, so
it's either what SSSD tells libkrb5 to look at (through a kdcinfo file
interface) DNS (because dns_lookup_kdc = true) in this order. And the
kdcinfo files should not be generated unless sssd can switch to the
'online' mode here.

> 
> krb5.conf from ipa-lx-test-debian9:
> 
> |#File modified by ipa-client-install includedir
> /var/lib/sss/pubconf/krb5.include.d/ [libdefaults] default_realm =
> IPA.EXAMPLE.COM dns_lookup_realm = true dns_lookup_kdc = true rdns =
> false ticket_lifetime = 24h forwardable = yes default_ccache_name =
> KEYRING:persistent:%{uid} [realms] IPA.EXAMPLE.COM = { pkinit_anchors =
> FILE:/etc/ipa/ca.crt } [domain_realm] .ipa.example.com = IPA.EXAMPLE.COM
> ipa.example.com = IPA.EXAMPLE.COM |
> 
> Regards,
> 
> Michael
> 
> Am 23.08.2017 um 22:20 schrieb Jakub Hrozek via FreeIPA-users:
> 
> > On Wed, Aug 23, 2017 at 05:13:13PM +0200, Michael Gusek via FreeIPA-users 
> > wrote:
> >> Hi,
> >>
> >> we are testing a FreeIPA trust to an Active Directory. Trust itself
> >> works, we are happy. Now we tested a failure on FreeIPA site. We have
> >> two instances, both with same roles. If we poweroff first installed
> >> server, and clean sssd caches with restart of sssd on client side , sssd
> >> service can’t establish a connection to second instance.
> >>
> >> ipa-lx-test-01.ipa.example.com is the first installed FreeIPA with
> >> ipa-server-4.4.0-14.el7.centos.7.x86_64 on latest CentOS7
> >> ipa-lx-test-02.ipa.example.com is the second installed FreeIPA with
> >> ipa-server-4.4.0-14.el7.centos.7.x86_64 on latest CentOS7
> >> ipa-lx-test-debian9.ipa.example.com is a latest Debian 9.1 with sssd
> >> 1.15.0-3
> >>
> >> For deeper inspection full log is attached. In logs we found something
> >> like this:
> > OK, so as you say communication with the KDC failed:
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] 
> > [be_resolve_server_process] (0x0200): Found address for server 
> > ipa-lx-test-02.ipa.example.com: [x.x.x.x] TTL 1200                          
> >              
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] 
> > [sdap_kinit_kdc_resolved] (0x1000): KDC resolved, attempting to get TGT...  
> >                                                                             
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] 
> > [create_tgt_req_send_buffer] (0x0400): buffer size: 81                      
> >                                                                             
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] 
> > [child_handler_setup] (0x2000): Setting up signal handler up for pid [2104] 
> >                                                                             
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] 
> > [child_handler_setup] (0x2000): Signal handler set up for pid [2104]        
> >                                                                             
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] 
> > [set_tgt_child_timeout] (0x0400): Setting 6 seconds timeout for tgt child   
> >                                                                             
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] 
> > [sdap_process_result] (0x2000): Trace: sh[0x558252c35750], connected[1], 
> > ops[(nil)], ldap[0x558252c130c0]                                            
> >    
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] 
> > [sdap_process_result] (0x2000): Trace: end of ldap_result list              
> >                                                                             
> > (Wed Aug 23 16:07:11 2017) [sssd[be[ipa.example.com]]] [write_pipe_handler] 
> > (0x0400): All data has been sent!                                           
> >                                                        
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] 
> > [get_tgt_timeout_handler] (0x4000): timeout for sending SIGTERM to tgt 
> > child [2104] reached.                                                       
> >      
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] 
> > [get_tgt_timeout_handler] (0x0400): Setting 2 seconds timeout for sending 
> > SIGKILL to tgt child                                                        
> >   
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [read_pipe_handler] 
> > (0x0400): EOF received, client finished                                     
> >                                                         
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_sig_handler] 
> > (0x1000): Waiting for child [2104].                                         
> >                                                         
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_sig_handler] 
> > (0x0020): child [2104] failed with status [7].                              
> >                                                         
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [child_callback] 
> > (0x0020): LDAP child was terminated due to timeout                          
> >                                                            
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [sdap_kinit_done] 
> > (0x0080): Communication with KDC timed out, trying the next one             
> >                                                           
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] 
> > [_be_fo_set_port_status] (0x8000): Setting status: PORT_NOT_WORKING. Called 
> > from: ../src/providers/ldap/sdap_async_connection.c: sdap_kinit_done: 1207  
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [fo_set_port_status] 
> > (0x0100): Marking port 389 of server 'ipa-lx-test-02.ipa.example.com' as 
> > 'not working'                                             
> > (Wed Aug 23 16:07:17 2017) [sssd[be[ipa.example.com]]] [fo_set_port_status] 
> > (0x0400): Marking port 389 of duplicate server 
> > 'ipa-lx-test-02.ipa.example.com' as 'not working' 
> >
> > Could you check in the ldap_child.log which KDC did SSSD try to talk to
> > and what takes so long?
> > _______________________________________________
> > FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
> > To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
> 
> ​
> -- 
> 
> ________________________________________________
> 
> 
> *Michael**Gusek*| System Administrator| Webtrekk GmbH |
> *t*+49 30 755 415 302| *f *+49 30 755 415 100 | *w *www.webtrekk.com
> <https://www.webtrekk.com/?wt_mc=signature.-.-.-.homepageURL>
> Amtsgericht/Local Court Berlin, HRB 93435 B | Geschäftsführer/CEO
> Christian Sauer und Wolf Lichtenstein
> 
> 

> _______________________________________________
> FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
> To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org
_______________________________________________
FreeIPA-users mailing list -- freeipa-users@lists.fedorahosted.org
To unsubscribe send an email to freeipa-users-le...@lists.fedorahosted.org

Reply via email to