Cache is verified valid by looking at the cache files /var/lib/sss/db/ ldb files.
Also, if I lookup the user on the IPA server I get a fast response. Looking up the user on a client which have a valid cache return the user within a few ms or secs. Invalidating the cache on the client with sss_cache results in the user being fetched from the IPA server. This takes about 15-20 seconds. This is acceptable. If I do a tcpdump of ldap traffic while this happens I only see traffic between sssd client and ipa server. Then I remove the cache files and restarts sssd on the client, simulating a newly joined client or a client that have cleaned up its ldb cache (ldap_purge_cache_timeout=0 only set on the server). At this point the IPA server still have a valid cache. Looking up the user while tcpdumping on IPA server I can see what looks like IPA fetches the entire information from AD. This is ~730 groups, mostly nested groups for this user. This takes 5-7 minutes........ The LDAP responds quickly, no CPU load, and I can do manual LDAP lookups then the user is fetched, so nothing that points to a busy LDAP. It doesn't matter if I do a lookup of the user or he logs in. Its the same behavior. Currently its very limited load, but this is going to scale up later on. ----- On Feb 8, 2017, at 2:12 PM, Sullivan, Daniel [CRI] [email protected] wrote: > Are you actually logging in or or just doing a lookup on a user? I remember > reading somewhere that groups are always re-evaluated at the point of login, > regardless of what is in the cache. I am not sure if this is accurate or the > implications of whether or not it is on the client, server or both, but this > could be what is happening. > > Also, how are verifying that the group in the cache on the server is valid? > If > you do a lookup a user on the server after the mapped memory cache expires > (i.e. if you restart sssd), do you see a log entry that a valid cache entry > has > been found and that entry is being returned (from memory it is in the nsss > logs)? It might help you to focus on the sysdb cache by setting a very low > timeout value for the in memory mapped cache and or negative cache timeouts. > Based on my experiencif a valid cache entry in the sysdb *NOT* found, a log > entry is not returned (just a lookup is performed), which for an untrained eye > makes it difficult to assess when timeouts are occurring in the sysdb cache. > > Also, five minutes is a ridiculously long time to have to wait, have you > inspected the performance of your 389 server? Is your environment large or > under large load? Based on experience and another thread from this list, I > have seen the 389ds threads get tied up due to extop operations because AD > lookups via SSSD eventually will be blocked sssd under high load. > > I had to significantly scale 389ds worker threads and processor cores to > accommodate large numbers of concurrent AD lookups, this improved the > performance of our environment and solved many issues. > > Dan > >> On Feb 8, 2017, at 1:27 AM, Troels Hansen <[email protected]> wrote: >> >> No, ignore_group_members option is already set. >> Tried setting and removing it on the client but didn't make a huge different. >> >> Is a client have an completely empty cache, newly joined, empty >> /var/lig/sssd/db >> and mc cache or something, the IPA server ALWAYS asks the AD for group info, >> despite having a valid cache. >> >> This leads to a delay of up to, or even more 5 minutes before a user can >> login. >> Being denied access at first, but logging again later with success. >> >> If the client have a cache thats only expired its able to pull the account >> info >> from the cache on the IPA, without the IPA having to contact the AD. >> >> >> ----- On Feb 6, 2017, at 2:24 PM, Sullivan, Daniel [CRI] >> [email protected] wrote: >> >>> Have you looked at the ignore_group_members option? Maybe this is the >>> problem >>> you are seeing? >>> >>> https://jhrozek.wordpress.com/2015/08/19/performance-tuning-sssd-for-large-ipa-ad-trust-deployments/ >>> >>> ==snip== >>> >>> ignore_group_members >>> Normally the most data-intensive operation is downloading the groups >>> including >>> their members. Usually, we are interested in what groups a user is a member >>> of >>> (id aduser@ad_domain) as the initial step rather than what members do >>> specific >>> groups include (getent group adgroup@ad_domain). Setting the >>> ignore_group_members option to True makes all groups appear as empty, thus >>> downloading only information about the group objects themselves and not >>> their >>> members, providing a significant performance boost. Please note that id >>> aduser@ad_domain would still return all the correct groups! >>> ==snip== >>> >>> >>> Dan >>> >>> On Feb 6, 2017, at 1:59 AM, Troels Hansen <[email protected]> wrote: >>> >>> Hi >>> >>> I'm aware of the anatomy of how the lookup is done, but I would suppose a >>> valid >>> cache on the IPA server would result in the cache from the IPA server being >>> used? >>> >>> I have been debugging this issue some more, and can confirm is the client >>> have >>> its sssd cache invalidated by "sss_cache -E" and the IPA server have a valid >>> cache, when the client asks for the user, the IPA server still asks the AD >>> for >>> the entire group info? >>> >>> I can see, that even though the cache is refreshed the attribute >>> initgrExpireTimestamp (in the ldb cache) isn't updated. >>> I have been unable to find out exactly what this controls? >>> >>> lastUpdate and dataExpireTimestamp is updated to the time stamp of when the >>> refresh ran. >>> >>> >>> ----- On Feb 1, 2017, at 2:27 PM, Sullivan, Daniel [CRI] >>> [email protected] wrote: >>> >>> Have you checked to see if the user is expired in the cache, or if it is >>> impacted by entry_cache_nowait_percentage (ref sssd.conf). The default entry >>> timeout is only 90 minutes and entry_cache_nowait_percentage default is 50. >>> >>> ldbsearch -H /var/lib/sss/db/timestamps_xxx.xxx.uchicago.edu.ldb > >>> ~/timestamps.txt >>> ldbsearch -H /var/lib/sss/db/cache_xxx.xxx.uchicago.edu.ldb > ~/cache.txt >>> >>> Might be able to provide more info. >>> >>> Again, I’d really familiarize yourself with the anatomy of an sssd lookup, >>> if >>> you get to know the diagram and steps 1-7 here it will be a big help to your >>> question(s). >>> https://jhrozek.wordpress.com/2015/03/11/anatomy-of-sssd-user-lookup/ >>> >>> Dan >>> >>> >>> On Feb 1, 2017, at 4:32 AM, Troels Hansen <[email protected]> wrote: >>> >>> From looking af at TCP dump, I can see that if a client requests a AD user >>> from >>> IPA, IPA does a full user lookup in AD, even though the IPA server have the >>> user in local cache? >>> >>> It looks like a single group generates a LOT of traffic to AD like: >>> client -> IPA >>> IPA -> client >>> IPA -> AD >>> AD -> IPA >>> IPA -> AD >>> IPA -> Second AD >>> Second AD -> IPA >>> IPA -> Second AD >>> IPA -> AD >>> AD -> IPA >>> IPA -> AD >>> IPA -> client >>> client -> IPA >>> IPA -> Client >>> >>> Something similar continues for every group the user has. >>> >>> Soo, I guess the question that I haven't been able to find documented >>> anywhere. >>> Isn't the SSSD group cache on the IPA servers supposed to be used then a >>> sssd >>> client requests a user? >>> >>> >>> >>> ----- On Feb 1, 2017, at 9:53 AM, Troels Hansen [email protected] wrote: >>> >>> Hmm, suspect its happening on the server...... thous I haven't been able to >>> pinpoint a log entry that confirms my suspecting. >>> >>> I have pinpointed the timeout to happen after 58 seconds after completely >>> removing the SSSD cache and restaring SSSD, which leads me to think my >>> issue is >>> related to ldap_enumaration_search_timeout which defaults to 60. >>> >>> rm -fr /var/lib/sss/{mc,db}/* && systemctl restart sssd && time id shja >>> >>> However, I'm unable to crank it up on the server as it seems to be adjusted >>> Down >>> to 60 Again? >>> >>> Wed Feb 1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_get_options] (0x0400): >>> Option >>> ldap_enumeration_search_timeout has value 120 >>> (Wed Feb 1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_copy_options_ex] >>> (0x0400): >>> Option ldap_enumeration_search_timeout has value 60 >>> (Wed Feb 1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_copy_options_ex] >>> (0x0400): >>> Option ldap_enumeration_search_timeout has value 60 >>> >>> LDAP seems speedy enough, not timeouts while querying it manually while a >>> client >>> is doing a user lookup. >>> >>> ----- On Jan 30, 2017, at 6:06 PM, Sullivan, Daniel [CRI] >>> [email protected] wrote: >>> >>> >>> If the timeout is occurring on the server, I would start by increasing one >>> or >>> both of these values: >>> >>> ldap_opt_timeout >>> ldap_search_timeout >>> >>> If that doesn’t work I’d take look to see if the 389 server is under high >>> load >>> when you are performing this operation. The easiest way I have found to do >>> this is to just execute an LDAP query directly against the IPA server when >>> you >>> are performing an id lookup, for example: >>> >>> ldapsearch -D "cn=Directory Manager" -w <pw> -s base -b "cn=config" >>> "(objectclass=*)” >>> >>> If the LDAP server is not responsive you probably need to increase the >>> number of >>> worker threads for 389ds. Also, you might want to disable referrals, check >>> out >>> the man pages for this; >>> >>> ldap_referrals = false >>> >>> Also, FWIW, if you crank up debug logging on the sssd client, you should be >>> able >>> to see the amount of seconds of timeout assigned to the operation, and >>> witness >>> the fact that the operation is actually timing out by inspecting the logs >>> themselves. The logs can get a little verbose but the data is there. >>> >>> >>> -- >>> Manage your subscription for the Freeipa-users mailing list: >>> https://www.redhat.com/mailman/listinfo/freeipa-users >>> Go to http://freeipa.org for more info on the project >>> >>> -- >>> Med venlig hilsen >>> >>> Troels Hansen >>> >>> Systemkonsulent >>> >>> Casalogic A/S >>> >>> >>> T (+45) 70 20 10 63 >>> >>> M (+45) 22 43 71 57 >>> >>> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og >>> meget mere. >>> >>> -- >>> Manage your subscription for the Freeipa-users mailing list: >>> https://www.redhat.com/mailman/listinfo/freeipa-users >>> Go to http://freeipa.org for more info on the project >>> >>> -- >>> Med venlig hilsen >>> >>> Troels Hansen >>> >>> Systemkonsulent >>> >>> Casalogic A/S >>> >>> >>> T (+45) 70 20 10 63 >>> >>> M (+45) 22 43 71 57 >>> >>> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og >>> meget mere. >> >> -- >> Med venlig hilsen >> >> Troels Hansen >> >> Systemkonsulent >> >> Casalogic A/S >> >> >> T (+45) 70 20 10 63 >> >> M (+45) 22 43 71 57 >> >> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og > > meget mere. -- Med venlig hilsen Troels Hansen Systemkonsulent Casalogic A/S T (+45) 70 20 10 63 M (+45) 22 43 71 57 Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og meget mere. -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
