Re: [Freeipa-users] Needs help understand this timeout issue

Sullivan, Daniel [CRI] Wed, 08 Feb 2017 05:17:23 -0800

Are you actually logging in or or just doing a lookup on a user?  I remember 
reading somewhere that groups are always re-evaluated at the point of login, 
regardless of what is in the cache.  I am not sure if this is accurate or the 
implications of whether or not it is on the client, server or both, but this 
could be what is happening.


Also, how are verifying that the group in the cache on the server is valid?  If 
you do a lookup a user on the server after the mapped memory cache expires 
(i.e. if you restart sssd), do you see a log entry that a valid cache entry has 
been found and that entry is being returned (from memory it is in the nsss 
logs)?  It might help you to focus on the sysdb cache by setting a very low 
timeout value for the in memory mapped cache and or negative cache timeouts. 
Based on my experiencif a valid cache entry in the sysdb *NOT* found, a log 
entry is not returned (just a lookup is performed), which for an untrained eye 
makes it difficult to assess when timeouts are occurring in the sysdb cache. 

Also, five minutes is a ridiculously long time to have to wait, have you 
inspected the performance of your 389 server?  Is your environment large or 
under large load?  Based on experience and another thread from this list, I 
have seen the 389ds threads get tied up due to extop operations because AD 
lookups via SSSD eventually will be blocked sssd under high load.  

I had to significantly scale 389ds worker threads and processor cores to 
accommodate large numbers of concurrent AD lookups, this improved the 
performance of our environment and solved many issues.

Dan

> On Feb 8, 2017, at 1:27 AM, Troels Hansen <[email protected]> wrote:
> 
> No, ignore_group_members option is already set.
> Tried setting and removing it on the client but didn't make a huge different.
> 
> Is a client have an completely empty cache, newly joined, empty 
> /var/lig/sssd/db and mc cache or something, the IPA server ALWAYS asks the AD 
> for group info, despite having a valid cache.
> 
> This leads to a delay of up to, or even more 5 minutes before a user can 
> login. Being denied access at first, but logging again later with success.
> 
> If the client have a cache thats only expired its able to pull the account 
> info from the cache on the IPA, without the IPA having to contact the AD.
> 
> 
> ----- On Feb 6, 2017, at 2:24 PM, Sullivan, Daniel [CRI] 
> [email protected] wrote:
> 
>> Have you looked at the ignore_group_members option?  Maybe this is the 
>> problem
>> you are seeing?
>> 
>> https://jhrozek.wordpress.com/2015/08/19/performance-tuning-sssd-for-large-ipa-ad-trust-deployments/
>> 
>> ==snip==
>> 
>> ignore_group_members
>> Normally the most data-intensive operation is downloading the groups 
>> including
>> their members. Usually, we are interested in what groups a user is a member 
>> of
>> (id aduser@ad_domain) as the initial step rather than what members do 
>> specific
>> groups include (getent group adgroup@ad_domain). Setting the
>> ignore_group_members option to True makes all groups appear as empty, thus
>> downloading only information about the group objects themselves and not their
>> members, providing a significant performance boost. Please note that id
>> aduser@ad_domain would still return all the correct groups!
>> ==snip==
>> 
>> 
>> Dan
>> 
>> On Feb 6, 2017, at 1:59 AM, Troels Hansen <[email protected]> wrote:
>> 
>> Hi
>> 
>> I'm aware of the anatomy of how the lookup is done, but I would suppose a 
>> valid
>> cache on the IPA server would result in the cache from the IPA server being
>> used?
>> 
>> I have been debugging this issue some more, and can confirm is the client 
>> have
>> its sssd cache invalidated by "sss_cache -E" and the IPA server have a valid
>> cache, when the client asks for the user, the IPA server still asks the AD 
>> for
>> the entire group info?
>> 
>> I can see, that even though the cache is refreshed the attribute
>> initgrExpireTimestamp (in the ldb cache) isn't updated.
>> I have been unable to find out exactly what this controls?
>> 
>> lastUpdate and dataExpireTimestamp is updated to the time stamp of when the
>> refresh ran.
>> 
>> 
>> ----- On Feb 1, 2017, at 2:27 PM, Sullivan, Daniel [CRI]
>> [email protected] wrote:
>> 
>> Have you checked to see if the user is expired in the cache, or if it is
>> impacted by entry_cache_nowait_percentage (ref sssd.conf). The default entry
>> timeout is only 90 minutes and entry_cache_nowait_percentage default is 50.
>> 
>> ldbsearch -H  /var/lib/sss/db/timestamps_xxx.xxx.uchicago.edu.ldb >
>> ~/timestamps.txt
>> ldbsearch -H  /var/lib/sss/db/cache_xxx.xxx.uchicago.edu.ldb > ~/cache.txt
>> 
>> Might be able to provide more info.
>> 
>> Again, I’d really familiarize yourself with the anatomy of an sssd lookup, if
>> you get to know the diagram and steps 1-7 here it will be a big help to your
>> question(s).
>> https://jhrozek.wordpress.com/2015/03/11/anatomy-of-sssd-user-lookup/
>> 
>> Dan
>> 
>> 
>> On Feb 1, 2017, at 4:32 AM, Troels Hansen <[email protected]> wrote:
>> 
>> From looking af at TCP dump, I can see that if a client requests a AD user 
>> from
>> IPA, IPA does a full user lookup in AD, even though the IPA server have the
>> user in local cache?
>> 
>> It looks like a single group generates a LOT of traffic to AD like:
>> client -> IPA
>> IPA -> client
>> IPA -> AD
>> AD -> IPA
>> IPA -> AD
>> IPA -> Second AD
>> Second AD -> IPA
>> IPA -> Second AD
>> IPA -> AD
>> AD -> IPA
>> IPA -> AD
>> IPA -> client
>> client -> IPA
>> IPA -> Client
>> 
>> Something similar continues for every group the user has.
>> 
>> Soo, I guess the question that I haven't been able to find documented 
>> anywhere.
>> Isn't the SSSD group cache on the IPA servers supposed to be used then a sssd
>> client requests a user?
>> 
>> 
>> 
>> ----- On Feb 1, 2017, at 9:53 AM, Troels Hansen [email protected] wrote:
>> 
>> Hmm, suspect its happening on the server...... thous I haven't been able to
>> pinpoint a log entry that confirms my suspecting.
>> 
>> I have pinpointed the timeout to happen after 58 seconds after completely
>> removing the SSSD cache and restaring SSSD, which leads me to think my issue 
>> is
>> related to ldap_enumaration_search_timeout which defaults to 60.
>> 
>> rm -fr /var/lib/sss/{mc,db}/* && systemctl restart sssd && time id shja
>> 
>> However, I'm unable to crank it up on the server as it seems to be adjusted 
>> Down
>> to 60 Again?
>> 
>> Wed Feb  1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_get_options] (0x0400): 
>> Option
>> ldap_enumeration_search_timeout has value 120
>> (Wed Feb  1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_copy_options_ex] 
>> (0x0400):
>> Option ldap_enumeration_search_timeout has value 60
>> (Wed Feb  1 09:40:56 2017) [sssd[be[lx.dr.dk]]] [dp_copy_options_ex] 
>> (0x0400):
>> Option ldap_enumeration_search_timeout has value 60
>> 
>> LDAP seems speedy enough, not timeouts while querying it manually while a 
>> client
>> is doing a user lookup.
>> 
>> ----- On Jan 30, 2017, at 6:06 PM, Sullivan, Daniel [CRI]
>> [email protected] wrote:
>> 
>> 
>> If the timeout is occurring on the server, I would start by increasing one or
>> both of these values:
>> 
>> ldap_opt_timeout
>> ldap_search_timeout
>> 
>> If that doesn’t work I’d take look to see if the 389 server is under high 
>> load
>> when you are performing this operation.  The easiest way I have found to do
>> this is to just execute an LDAP query directly against the IPA server when 
>> you
>> are performing an id lookup, for example:
>> 
>> ldapsearch -D "cn=Directory Manager" -w <pw> -s base -b "cn=config"
>> "(objectclass=*)”
>> 
>> If the LDAP server is not responsive you probably need to increase the 
>> number of
>> worker threads for 389ds.  Also, you might want to disable referrals, check 
>> out
>> the man pages for this;
>> 
>> ldap_referrals = false
>> 
>> Also, FWIW, if you crank up debug logging on the sssd client, you should be 
>> able
>> to see the amount of seconds of timeout assigned to the operation, and 
>> witness
>> the fact that the operation is actually timing out by inspecting the logs
>> themselves.  The logs can get a little verbose but the data is there.
>> 
>> 
>> --
>> Manage your subscription for the Freeipa-users mailing list:
>> https://www.redhat.com/mailman/listinfo/freeipa-users
>> Go to http://freeipa.org for more info on the project
>> 
>> --
>> Med venlig hilsen
>> 
>> Troels Hansen
>> 
>> Systemkonsulent
>> 
>> Casalogic A/S
>> 
>> 
>> T (+45) 70 20 10 63
>> 
>> M (+45) 22 43 71 57
>> 
>> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og
>> meget mere.
>> 
>> --
>> Manage your subscription for the Freeipa-users mailing list:
>> https://www.redhat.com/mailman/listinfo/freeipa-users
>> Go to http://freeipa.org for more info on the project
>> 
>> --
>> Med venlig hilsen
>> 
>> Troels Hansen
>> 
>> Systemkonsulent
>> 
>> Casalogic A/S
>> 
>> 
>> T (+45) 70 20 10 63
>> 
>> M (+45) 22 43 71 57
>> 
>> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og
>> meget mere.
> 
> -- 
> Med venlig hilsen 
> 
> Troels Hansen 
> 
> Systemkonsulent 
> 
> Casalogic A/S 
> 
> 
> T (+45) 70 20 10 63 
> 
> M (+45) 22 43 71 57 
> 
> Red Hat, SUSE, VMware, Citrix, Novell, Yellowfin BI, EnterpriseDB, Sophos og 
> meget mere.


-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Re: [Freeipa-users] Needs help understand this timeout issue

Reply via email to