I’ve received incredibly good support from this mailing list previously; I am 
hoping that somebody can help me succeed in my ongoing efforts.  I have spent a 
few days on this at this point and I can’t seem to figure it out how to address 
this issue.  On my DCs I am seeing excessive ldap_search_ext and 
sdap_get_generic_ext_recv timeouts created solely by the invocation of the ‘id’ 
command on sssd clients.  This problem seems to present itself only when I 
parallelize lookups for an ‘uncached’ user (i.e. I have never performed an 
initial lookup).  Individual arbitrary one-off lookups for a single uncached 
user on a single system almost always work fine.  This leads me to believe this 
is a performance tuning issue.

We operate in an academic research computing unit (i.e. we have an HPC 
cluster), and I need the ability to lookup the same user in parallel (using the 
id command) across a relatively large number of systems, for example to spawn 
jobs that require large amounts of CPU cores and/or memory.  Right now I am 
doing about 50 parallel lookups for the same user to induce this problem.  

Here is some background information:

1) I have read Jakub's “Anatomy of an SSSD Lookup” as well as “Performance 
Tuning of SSSD for large IPA-AD deployments”, as well as implemented 
recommendations from the performance tuning doc, including moving the sssd 
cache to tmpfs.
2) We are on ipa-server 4.4.0-14.el7_3.4 using a trusted AD domain; all of our 
consumed users and groups are in the AD trusted domain.  We have two domain 
controllers; each is a RHEL 7.3 VM with 6 GB of memory.  Almost all (if not 
all) of our clients are running at least sssd 1.14, and are all RHEL 6/7.  
Neither DC is swapping, and both have 2 CPUs.
3) I have tuned SSSD clients on the DCs and all clients to include these 
options (the problem persists):
  a) ldap_opt_timeout = 60
  b) ldap_search_timeout = 60
4) On both DCs, I can clear the SSSD cache, and lookup all 2000 or so users in 
my environment with 40 concurrent lookups occurring locally on each DC (using 
UNIX job control).  I can process all 2000 lookups in this manner without any 
failures (on either DC), and have ‘pre-populated’ the SSSD cache on both DC’s 
by doing this.
6) I have made no additional performance tuning changes other than what has 
been described.

Would anybody be able to advise on any potential tuning that would be required 
(presumably on the DCs), to facilitate 50 parallel lookups without experiencing 
sdap_get_generic_ext_recv or  ldap_search_ext  timeouts?  Should I be able to 
do this sort of thing with relative ease?  I was hoping this would be the sort 
of thing that would just work, but based on my relatively extensive testing it 
doesn’t.  Any advice anybody could provide would be greatly appreciated.

Thank you,

Dan Sullivan

Manage your subscription for the Freeipa-users mailing list:
Go to http://freeipa.org for more info on the project

Reply via email to