On (12/05/16 11:03), Harald Dunkel wrote:
>On 05/12/16 10:26, Lukas Slebodnik wrote:
>> On (12/05/16 09:42), Harald Dunkel wrote:
>>> It happened again :-(.This *really* needs to be fixed.
>>> I wouldn't like to move back to ypbind.
>> I would like to If I knew what to fix and how to reliably reproduce.
>It would be very nice if sssd could become more reliable at
>startup time. It gives up to easy. And it is not restarted
>in case of a problem, which is fatal for a service providing
>access to a user database.
It would be nice if you could provide reliable reproducer.
I'm sorry we do not have a crystall ball and sssd log files
did not help either. They are truncated.

I would like to fix it but I do not know what to fix.

Is there anything interesting/suspicious in syslog/journald
from the same time?

>>> Logfiles are attached. sssd is version 1.13.3. The server
>>> was rebooted at 05:56. At 06:03:18 sssd wrote the first
>>> logfile entries.
>> I cannot see in log files that sssd was started.
>(Thu May 12 05:56:12 2016) [sssd] [monitor_quit] (0x0020): Child [sudo] exited 
>(Thu May 12 05:56:12 2016) [sssd] [monitor_quit] (0x0020): Terminating 
>(Thu May 12 05:56:12 2016) [sssd] [monitor_quit] (0x0020): Child [nss] exited 
>(Thu May 12 06:03:18 2016) [sssd] [sysdb_domain_init_internal] (0x0200): DB 
>File for example.com: /var/lib/sss/db/cache_example.com.ldb
>(Thu May 12 06:03:20 2016) [sssd] [get_ping_config] (0x0100): Time between 
>service pings for [example.com]: [10]
>(Thu May 12 06:03:20 2016) [sssd] [get_ping_config] (0x0100): Time between 
>SIGTERM and SIGKILL for [example.com]: [60]
>(Thu May 12 06:03:20 2016) [sssd] [start_service] (0x0100): Queueing service 
>example.com for startup
>(Thu May 12 06:03:22 2016) [sssd] [sbus_server_init_new_connection] (0x0200): 
I saw these lines but I miss messages about startup of sssd.
something like:
  [server_setup] (0x0400): CONFDB: /var/lib/sss/db/config.ldb
  [dp_get_options] (0x0400): Option lookup_family_order has value ipv4_first
  [dp_get_options] (0x0400): Option dns_resolver_timeout has value 6
  [dp_get_options] (0x0400): Option dns_resolver_op_timeout has value 6
  [dp_get_options] (0x0400): Option dns_discovery_domain has no value
  [be_res_get_opts] (0x0100): Lookup order: ipv4_first
  [recreate_ares_channel] (0x0100): Initializing new c-ares channel
  [fo_context_init] (0x0400): Created new fail over context, retry timeout is 30
  [confdb_get_domain_internal] (0x0400): No enumeration for [example.com]!
  [confdb_get_domain_internal] (0x1000): pwd_expiration_warning is -1
  [sysdb_domain_init_internal] (0x0200): DB File for example.com: 
  [sbus_init_connection] (0x0400): Adding connection 0x55b875a67cc0
  [sbus_add_watch] (0x2000): 0x55b875a68ae0/0x55b875a67590 (15), -/W (enabled)
  [sbus_toggle_watch] (0x4000): 0x55b875a68ae0/0x55b875a675e0 (15), R/- 
  [sbus_opath_hash_add_iface] (0x0400): Registering interface 
org.freedesktop.sssd.service with path /org/freedesktop/sssd/service
  [sbus_conn_register_path] (0x0400): Registering object path 
/org/freedesktop/sssd/service with D-Bus connection
  [sbus_opath_hash_add_iface] (0x0400): Registering interface 
org.freedesktop.DBus.Properties with path /org/freedesktop/sssd/service
  [sbus_opath_hash_add_iface] (0x0400): Registering interface 
org.freedesktop.DBus.Introspectable with path /org/freedesktop/sssd/service
  [monitor_common_send_id] (0x0100): Sending ID: (%BE_example.com,1)

>> Log files seems to be truncated and there seems to be probllem
>> with network communication.
>> [be_resolve_server_process] (0x0200): Found address for server 
>> ipa2.example.com: [] TTL 7200
>> [init_timeout] (0x0040): Client timed out before Identification [0x12d50c0]!
>> [sdap_kinit_done] (0x0080): Communication with KDC timed out, trying the 
>> next one
>> [fo_set_port_status] (0x0100): Marking port 389 of server 'ipa2.example.com' 
>> as 'not working'
>You have cut off the time stamps. Here they are:
That was on purpose. Because it's clear that "Communication with KDC timed out"
The question is why?
6 seconds must be enough unless you try to connect the the server
which is located in opposite site of globe.

>(Thu May 12 06:03:31 2016) [sssd[be[example.com]]] [be_resolve_server_process] 
>(0x0200): Found address for server ipa2.example.com: [] TTL 7200
>(Thu May 12 06:03:36 2016) [sssd[be[example.com]]] [init_timeout] (0x0040): 
>Client timed out before Identification [0x12d50c0]!
>(Thu May 12 06:03:37 2016) [sssd[be[example.com]]] [sdap_kinit_done] (0x0080): 
>Communication with KDC timed out, trying the next one
>(Thu May 12 06:03:37 2016) [sssd[be[example.com]]] [fo_set_port_status] 
>(0x0100): Marking port 389 of server 'ipa2.example.com' as 'not working'
>Obviously the 5 secs timeout is not sufficient for stable
>operation. I am not sure if thats the reason for sssd to
>go away, though.
That default value of ldap_opt_timeout is 6 seconds.
You might try to increase it but it will not help
if ipa2.example.com is unresponsive.

It will  just complicate situation because sssd will try to fallback later
to another server (ipa1.example.com). You might see in log files
that communication with ipa1.example.com was succesfull.

>> Do you have mounted nfs on /var/log/ or anywhere else?
>Surely not. All mount points are local.
Thank you.

>> It can explain a lot if there are network related issues.
>I don't see why there should be any network related issues.
>The ipa servers were available all the time. The network
>is configured static.
If there is not problem with network then
can you explain why sssd was not able to communicate
with ipa2.example.com?


Manage your subscription for the Freeipa-users mailing list:
Go to http://freeipa.org for more info on the project

Reply via email to