Sumit, Thank you for taking the time to help me; I know you are very active on supporting this project and I am thankful that you are willing to help me analyze this data. As per your request here is the log data from SSSD from the time window you specified. I anonymized the data but used find/replace so AAA /BBB/CCC/domain is consistent.
I really appreciate your help. Thank you, Dan (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_process_result] (0x2000): Trace: sh[0x7f5b5c3eec20], connected[1], ops[0x7f5b5c3f7ed0], ldap[0x7f5b5c3df9a0] (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT] (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_destructor] (0x2000): Operation 153 finished (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_get_ad_override_done] (0x4000): No override found with filter [(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:AAA))]. (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_id_op_destroy] (0x4000): releasing operation connection (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_initgr_get_overrides_step] (0x1000): Processing group 1/4 (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_initgr_get_overrides_step] (0x1000): Fetching group BBB (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_id_op_connect_step] (0x4000): reusing cached connection (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_get_ad_override_connect_done] (0x4000): Searching for overrides in view [Default Trust View] with filter [(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:BBB))]. (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_print_server] (0x2000): Searching 10.50.178.21:389 (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:BBB))][cn=Default Trust View,cn=views,cn=accounts,dc=ipa,dc=cri,dc=uchicago,dc=edu]. (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 154 (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_add] (0x2000): New operation 154 timeout 60 (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_process_result] (0x2000): Trace: sh[0x7f5b5c3eec20], connected[1], ops[0x7f5b5c3f7ed0], ldap[0x7f5b5c3df9a0] (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_process_result] (0x2000): Trace: end of ldap_result list (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_process_result] (0x2000): Trace: sh[0x7f5b5c3eec20], connected[1], ops[0x7f5b5c3f7ed0], ldap[0x7f5b5c3df9a0] (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT] (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_destructor] (0x2000): Operation 154 finished (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_get_ad_override_done] (0x4000): No override found with filter [(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:BBB))]. (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_id_op_destroy] (0x4000): releasing operation connection (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_initgr_get_overrides_step] (0x1000): Processing group 2/4 (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_initgr_get_overrides_step] (0x1000): Fetching group CCC (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_id_op_connect_step] (0x4000): reusing cached connection (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_get_ad_override_connect_done] (0x4000): Searching for overrides in view [Default Trust View] with filter [(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:CCC))]. (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_print_server] (0x2000): Searching 10.50.178.21:389 (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:CCC))][cn=Default Trust View,cn=views,cn=accounts,dc=ipa,dc=cri,dc=uchicago,dc=edu]. (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 155 (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_add] (0x2000): New operation 155 timeout 60 (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_process_result] (0x2000): Trace: sh[0x7f5b5c3eec20], connected[1], ops[0x7f5b5c3f7ed0], ldap[0x7f5b5c3df9a0] (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_process_result] (0x2000): Trace: end of ldap_result list (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sbus_dispatch] (0x4000): dbus conn: 0x7f5b5c3f8850 (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sbus_dispatch] (0x4000): Dispatching. (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sbus_message_handler] (0x2000): Received SBUS method org.freedesktop.sssd.dataprovider.getAccountInfo on path /org/freedesktop/sssd/dataprovider (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sbus_get_sender_id_send] (0x2000): Not a sysbus message, quit (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_get_account_info_handler] (0x0200): Got request for [0x1][BE_REQ_USER][1][name=apache@domain] (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_attach_req] (0x0400): DP Request [Account #4293]: New request. Flags [0x0001]. (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_attach_req] (0x0400): Number of active DP request: 2 (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_id_op_connect_step] (0x4000): reusing cached connection (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_id_op_connect_step] (0x4000): reusing cached connection (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_get_ad_override_connect_done] (0x4000): Searching for overrides in view [Default Trust View] with filter [(&(objectClass=ipaUserOverride)(uid=apache))]. (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_print_server] (0x2000): Searching 10.50.178.21:389 (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(objectClass=ipaUserOverride)(uid=apache))][cn=Default Trust View,cn=views,cn=accounts,dc=ipa,dc=cri,dc=uchicago,dc=edu]. (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 156 (Fri Jan 20 08:43:30 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_add] (0x2000): New operation 156 timeout 60 (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_timeout] (0x1000): Issuing timeout for 155 (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_destructor] (0x1000): Abandoning operation 155 (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed [110]: Connection timed out (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_get_ad_override_done] (0x0040): ipa_get_ad_override request failed. (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_id_op_destroy] (0x4000): releasing operation connection (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_initgr_get_overrides_override_done] (0x0040): IPA override lookup failed: 110 (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_id_get_groups_overrides_done] (0x0040): IPA resolve user groups overrides failed [110]. (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [be_mark_dom_offline] (0x1000): Marking subdomain domain offline (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [be_mark_subdom_offline] (0x1000): Marking subdomain domain as inactive (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_srv_ad_acct_lookup_done] (0x0040): ipa_get_*_acct request failed: [110]: Connection timed out. (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [110]: Connection timed out. (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_id_op_destroy] (0x4000): releasing operation connection (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_reply_std_set] (0x0080): DP Error is OK on failed request? (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_done] (0x0400): DP Request [Account #4292]: Request handler finished [0]: Success (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [_dp_req_recv] (0x0400): DP Request [Account #4292]: Receiving request data. (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_reply_list_success] (0x0400): DP Request [Account #4292]: Finished. Success. (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_reply_std] (0x1000): DP Request [Account #4292]: Returning [Success]: 0,110,Success (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_table_value_destructor] (0x0400): Removing [0:1:0x0001:1:1::domain:name=user@domain] from reply table (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_destructor] (0x0400): DP Request [Account #4292]: Request removed. (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_destructor] (0x0400): Number of active DP request: 1 > On Jan 20, 2017, at 10:27 AM, Sumit Bose <[email protected]> wrote: > > On Fri, Jan 20, 2017 at 03:41:46PM +0000, Sullivan, Daniel [CRI] wrote: >> Hi, >> >> I have some more information on this issue. I’m tracing it down through the >> slapd logs and I am continuing to struggle; I was hoping that somebody >> could possibly help me provided this additional information. >> >> On the domain logs (on the DC), I see an ldap_search_ext operation 155 with >> a timeout of 60: >> >> (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with >> [(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-xxx))][cn=Default >> Trust View,cn=views,cn=accounts,dc=xxx,dc=xxx,dc=uchicago,dc=edu]. >> (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 155 >> (Fri Jan 20 08:42:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [sdap_op_add] >> (0x2000): New operation 155 timeout 60 >> >> Then, on the slapd log for the server, I see the incoming request. Based on >> the result, it looks like the request completes successfully within 1 second: >> >> [20/Jan/2017:08:42:35.179890746 -0600] conn=1518 op=31 SRCH base="cn=Default >> Trust View,cn=views,cn=accounts,dc=xxx,dc=xxx,dc=uchicago,dc=edu" scope=2 >> filter="(&(objectClass=ipaOverrideAnchor)(ipaAnchorUUID=:SID:S-1-5-21-xxx))" >> attrs=ALL >> [20/Jan/2017:08:42:35.683485674 -0600] conn=1518 op=31 RESULT err=0 tag=101 >> nentries=0 etime=1 notes=U >> >> Then, subsequently, the domain log (on the DC), I see the same operation 155 >> timeout (expectedly almost a minute later). > > So it looks like ns-ldap send the response but it never reached SSSD in > time. Can you send what it happening in the SSSD logs between 08:42:34 > and 08:43:34 (if you prefer you can send it to me directly). Maybe there > is an operation which blocks SSSD for too long so that it returns too > late to the main loop to process the response? > > bye, > Sumit > >> >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_op_timeout] (0x1000): Issuing timeout for 155 >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_op_destructor] (0x1000): Abandoning operation 155 >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [generic_ext_search_handler] (0x0040): sdap_get_generic_ext_recv failed >> [110]: Connection timed out >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_get_ad_override_done] (0x0040): ipa_get_ad_override request failed. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_id_op_destroy] (0x4000): releasing operation connection >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_initgr_get_overrides_override_done] (0x0040): IPA override lookup >> failed: 110 >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_id_get_groups_overrides_done] (0x0040): IPA resolve user groups >> overrides failed [110]. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [be_mark_dom_offline] (0x1000): Marking subdomain domain offline >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [be_mark_subdom_offline] (0x1000): Marking subdomain domain as inactive >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_srv_ad_acct_lookup_done] (0x0040): ipa_get_*_acct request failed: >> [110]: Connection timed out. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [ipa_subdomain_account_done] (0x0040): ipa_get_*_acct request failed: [110]: >> Connection timed out. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [sdap_id_op_destroy] (0x4000): releasing operation connection >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_reply_std_set] (0x0080): DP Error is OK on failed request? >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [dp_req_done] >> (0x0400): DP Request [Account #4292]: Request handler finished [0]: Success >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] [_dp_req_recv] >> (0x0400): DP Request [Account #4292]: Receiving request data. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_req_reply_list_success] (0x0400): DP Request [Account #4292]: Finished. >> Success. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_req_reply_std] (0x1000): DP Request [Account #4292]: Returning >> [Success]: 0,110,Success >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_table_value_destructor] (0x0400): Removing >> [0:1:0x0001:1:1::domain:[email protected]] from reply table >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_req_destructor] (0x0400): DP Request [Account #4292]: Request removed. >> (Fri Jan 20 08:43:34 2017) [sssd[be[xxx.xxx.uchicago.edu]]] >> [dp_req_destructor] (0x0400): Number of active DP request: 1 >> >> It seems like whatever problem I have is a communication issue between sssd >> on the domain controller, and ns-slapd. I’m starting to venture down the >> road of doing trace/packet level debug logging on the LDAP server; this >> could be very time consuming and frustrating (I already have it enabled, I >> am starting to manipulate the log settings to yield potentially valuable >> data), I’m hoping that somebody might be able to provide some insight based >> on the information above; I can definitely lookup the user on both domain >> controllers & both IPA servers only use themselves for IPA servers. Thank >> you so much for reading and for your help. >> >> Dan >> >> >>> On Jan 19, 2017, at 4:15 PM, Sullivan, Daniel [CRI] >>> <[email protected]> wrote: >>> >>> Hi, >>> >>> I’ve received incredibly good support from this mailing list previously; I >>> am hoping that somebody can help me succeed in my ongoing efforts. I have >>> spent a few days on this at this point and I can’t seem to figure it out >>> how to address this issue. On my DCs I am seeing excessive ldap_search_ext >>> and sdap_get_generic_ext_recv timeouts created solely by the invocation of >>> the ‘id’ command on sssd clients. This problem seems to present itself >>> only when I parallelize lookups for an ‘uncached’ user (i.e. I have never >>> performed an initial lookup). Individual arbitrary one-off lookups for a >>> single uncached user on a single system almost always work fine. This >>> leads me to believe this is a performance tuning issue. >>> >>> We operate in an academic research computing unit (i.e. we have an HPC >>> cluster), and I need the ability to lookup the same user in parallel (using >>> the id command) across a relatively large number of systems, for example to >>> spawn jobs that require large amounts of CPU cores and/or memory. Right >>> now I am doing about 50 parallel lookups for the same user to induce this >>> problem. >>> >>> Here is some background information: >>> >>> 1) I have read Jakub's “Anatomy of an SSSD Lookup” as well as “Performance >>> Tuning of SSSD for large IPA-AD deployments”, as well as implemented >>> recommendations from the performance tuning doc, including moving the sssd >>> cache to tmpfs. >>> 2) We are on ipa-server 4.4.0-14.el7_3.4 using a trusted AD domain; all of >>> our consumed users and groups are in the AD trusted domain. We have two >>> domain controllers; each is a RHEL 7.3 VM with 6 GB of memory. Almost all >>> (if not all) of our clients are running at least sssd 1.14, and are all >>> RHEL 6/7. Neither DC is swapping, and both have 2 CPUs. >>> 3) I have tuned SSSD clients on the DCs and all clients to include these >>> options (the problem persists): >>> a) ldap_opt_timeout = 60 >>> b) ldap_search_timeout = 60 >>> 4) On both DCs, I can clear the SSSD cache, and lookup all 2000 or so users >>> in my environment with 40 concurrent lookups occurring locally on each DC >>> (using UNIX job control). I can process all 2000 lookups in this manner >>> without any failures (on either DC), and have ‘pre-populated’ the SSSD >>> cache on both DC’s by doing this. >>> 6) I have made no additional performance tuning changes other than what has >>> been described. >>> >>> Would anybody be able to advise on any potential tuning that would be >>> required (presumably on the DCs), to facilitate 50 parallel lookups without >>> experiencing sdap_get_generic_ext_recv or ldap_search_ext timeouts? >>> Should I be able to do this sort of thing with relative ease? I was hoping >>> this would be the sort of thing that would just work, but based on my >>> relatively extensive testing it doesn’t. Any advice anybody could provide >>> would be greatly appreciated. >>> >>> Thank you, >>> >>> Dan Sullivan >>> >>> >> >> >> -- >> Manage your subscription for the Freeipa-users mailing list: >> https://www.redhat.com/mailman/listinfo/freeipa-users >> Go to http://freeipa.org for more info on the project > > -- > Manage your subscription for the Freeipa-users mailing list: > https://www.redhat.com/mailman/listinfo/freeipa-users > Go to http://freeipa.org for more info on the project -- Manage your subscription for the Freeipa-users mailing list: https://www.redhat.com/mailman/listinfo/freeipa-users Go to http://freeipa.org for more info on the project
