Hi,

Changing pam_id_timeout = 60 and krb5_auth_timeout = 60 on the client in 
conjunction with enabling tmpfs caching for /var/lib/sss/db on the DC appears 
to have helped significantly.  This issue is becoming much more difficult to 
reproduce, although I can still reproduce it.  Now, it appears that rapid 
successive invocations of the id command will yield a returned record. The 
timeout for the output specified below (i.e. the time it took the first command 
to return) was definitely less than 60 seconds, probably 10-20.  I am going to 
look into the tuning options for sssd, and would of course be interested in any 
advisement you could provide this regard.  AFAIK this issue currently only 
impacts users with a large number of groups (in fact I have only been able to 
cause this issue one one user after tuning as described above).  I am going to 
script a test and do a lookup for every single ID Override user in our 
environment to see what kind of a hit rate I get.  I’ll report back.  Thank you 
again for your help.

[root@cri-kcriwebgdp1 log]# id rcrist

id: rcrist: No such user
[root@cri-kcriwebgdp1 log]# id rcrist
uid=339748142(rcrist) gid=339748142(rcrist) 
groups=339748142(rcrist),339801232(cri-aaa_static_hosting),788635799(adm-sde-clients),788600520(group
 policy creator owners),788602710(bsd exchange view only 
administrators),339792922(cri-all_users),788659064(aaa-static_hosting_groups),788601114(bsd$
 dns 
read),788609545(adm-trackitusers),339806103(cri-ciscat),788609528(adm-bsd-mis),788619855(adm-oua-dl),788615498(adm-himss),788637726(adm-dstmlist-dl),788600513(domain
 users),788601110(bsd$ all 
oua),788654299(cri-all_groups),788658170(ocr-sharepoint ocr 
members),788619946(adm-trackitreports),788638566(ocr-coi),788633650(#ocr-office-dl),788644425(ocr
 velos 
email),788609542(adm-testgroup1),788638733(ocr-dfc-users),788665477(med-section_shares-clinical
 trials (only)),788609532(adm-bsdis-print),788634332(ocr-clinical 
research),788609546(adm-tss),788658806(ocr-hiro),788672525(ocr-bsdvpn-allow),788640103(adm
 shpt srp 
contributors),788659092(ocr-sharepoint-velosupgrade),788639053(ocr-velos-tickets),788610719(adm-premigration-proofpoint),788635798(adm-sde-techs),788635657(adm-www-clinres),788653680(ocr-email-management),788663575(ocr-bsdirb),788658171(ocr-sharepoint
 irb members),788650124(ocr it),788609567(ors-teleform),788653595(ocr$ 
oua),788609341(ic),788646237(adm shpt ocr 
visitors),788609544(adm-trackittech),788671562(ocr-ocrepic),788652940(dma 
management)

Dan


On Jul 15, 2016, at 8:22 AM, Sullivan, Daniel [AAA] 
<dsulliv...@bsd.uchicago.edu<mailto:dsulliv...@bsd.uchicago.edu>> wrote:

Jakub,

Sure, no problem, I am happy to provide the output that you are requesting.  
Thank you for taking the time to help me.

To answer your question, no record is returned (not missing groups). For 
example, the output of the failure was:

[root@cri-kcriwebgdp1 log]# id mjarsulic
id: mjarsulic: No such user

As per your request I have attached domain and nss logs for a lookup on the 
user ‘spott’ (command invoked ‘id spott’ on the client). (immediately after 
executing 'sss_cache -E; service sssd stop ; rm -rf /var/log/sssd/*; service 
sssd start;’ on the client):

IPA - https://gist.github.com/dsulli99/4e45faa39474b9131be811e4a0779c40
NSS - https://gist.github.com/dsulli99/e2e10da34ff860ec15e56ea521eb8315

Not every record fails, and the behavior is inconsistent between lookups (i.e. 
sometimes a user will lookup correctly, sometimes it will not), but it appears 
that in some situations a timeout is occurring in the nss logs (not in the 
failure above).   In these situations it looks to me like the query is 
dispatched to the DC, and the lookup times out.  If I wait a little bit and 
perform the lookup on the same user again,  the record is returned (presumably 
because the DC eventually resolved and cached the query?).  We are migrating 
from CentrifyDC and have loaded 2000+ custom ID overrides into our Default 
Trust ID View; perhaps we will need to implement the tempfs caching for the 
/var/lib/sss/db on the DC as described in your performance tuning document 
(https://jhrozek.wordpress.com/2015/08/19/performance-tuning-sssd-for-large-ipa-ad-trust-deployments/).
  These timeouts look like:

(Fri Jul 15 07:21:04 2016) [sssd[nss]] [get_dp_name_and_id] (0x0400): Not a 
LOCAL view, continuing with provided values.
(Fri Jul 15 07:21:04 2016) [sssd[nss]] [sss_dp_issue_request] (0x0400): Issuing 
request for 
[0x41e750:1:b...@bsdad.uchicago.edu<mailto:b...@bsdad.uchicago.edu><mailto:b...@bsdad.uchicago.edu>@bsdad.uchicago.edu<http://bsdad.uchicago.edu>]
(Fri Jul 15 07:21:04 2016) [sssd[nss]] [sss_dp_get_account_msg] (0x0400): 
Creating request for 
[bsdad.uchicago.edu<http://bsdad.uchicago.edu><http://bsdad.uchicago.edu>][0x1][BE_REQ_USER][1][name=b...@bsdad.uchicago.edu<mailto:name=b...@bsdad.uchicago.edu><mailto:name=b...@bsdad.uchicago.edu>:-]
(Fri Jul 15 07:21:04 2016) [sssd[nss]] [sbus_add_timeout] (0x2000): 0x1fa9020
(Fri Jul 15 07:21:04 2016) [sssd[nss]] [sss_dp_internal_get_send] (0x0400): 
Entering request 
[0x41e750:1:b...@bsdad.uchicago.edu<mailto:b...@bsdad.uchicago.edu><mailto:b...@bsdad.uchicago.edu>@bsdad.uchicago.edu<http://bsdad.uchicago.edu>]
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [sbus_remove_timeout] (0x2000): 0x1fa9020
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [sbus_dispatch] (0x4000): dbus conn: 
0x1fa0730
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [sbus_dispatch] (0x4000): Dispatching.
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [sss_dp_get_reply] (0x1000): Got reply 
from Data Provider - DP error code: 3 errno: 110 error message: Connection 
timed out
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [nss_cmd_getby_dp_callback] (0x0040): 
Unable to get information from Data Provider
Error: 3, 110, Connection timed out
Will try to return what we have in cache
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [sss_dp_req_destructor] (0x0400): 
Deleting request: 
[0x41e750:1:b...@bsdad.uchicago.edu<mailto:b...@bsdad.uchicago.edu><mailto:b...@bsdad.uchicago.edu>@bsdad.uchicago.edu<http://bsdad.uchicago.edu>]
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [reset_idle_timer] (0x4000): Idle timer 
re-set for client [0x1fa7fc0][22]
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [reset_idle_timer] (0x4000): Idle timer 
re-set for client [0x1fa7fc0][22]
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [client_recv] (0x0200): Client 
disconnected!
(Fri Jul 15 07:21:17 2016) [sssd[nss]] [client_close_fn] (0x2000): Terminated 
client [0x1fa7fc0][22]

I’m going to implement tmpfs caching on the DC, hopefully this will address at 
least a subset of these lookup failures.  I’ll report back with my findings.

Thank you again for your help.

Best,

Dan Sullivan




On Jul 15, 2016, at 7:12 AM, Jakub Hrozek 
<jhro...@redhat.com<mailto:jhro...@redhat.com><mailto:jhro...@redhat.com>> 
wrote:

On Fri, Jul 15, 2016 at 12:00:56PM +0000, Sullivan, Daniel [AAA] wrote:
Lukas,

Thank you for your reply and inquiry.

First, to answer your question; yes, we have been using the 
default_domain_suffix for some time.  I am not sure what you mean by 
previously, but it is currently implemented and has been implemented prior to 
our 1.13 -> 1.14 upgrade.

And yes, I am assessing a possible software regression at the
current moment. It might be related to the default_domain_suffix
you are inquiring about.  Basically I am getting inconsistent
results on invocation of the id command with specifying the username
as ‘username’ or ‘username@fqdn’ on a client running 1.14
against a DC running 1.13 (i.e. no way to reliably invoke id against a
trusted domain account).  Sometimes the command will return a result,
and sometimes it will not.

No result or missing groups?

Looking at nss debug logs it appears that
a duplicate fqdn is being appended to the nss query as show here (as
@bsdad.uchicago....@bsdad.uchicago.edu<mailto:bsdad.uchicago....@bsdad.uchicago.edu><mailto:bsdad.uchicago....@bsdad.uchicago.edu><mailto:bsdad.uchicago....@bsdad.uchicago.edu>).
This lookup fails.

Yes, this is wrong, can you send me the full NSS and domain logs please?

--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


********************************************************************************
This e-mail is intended only for the use of the individual or entity to which
it is addressed and may contain information that is privileged and confidential.
If the reader of this e-mail message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is prohibited. If you have received this e-mail in error, please
notify the sender and destroy all copies of the transmittal.

Thank you
University of Chicago Medicine and Biological Sciences
********************************************************************************

--
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project


********************************************************************************
This e-mail is intended only for the use of the individual or entity to which
it is addressed and may contain information that is privileged and confidential.
If the reader of this e-mail message is not the intended recipient, you are 
hereby notified that any dissemination, distribution or copying of this
communication is prohibited. If you have received this e-mail in error, please 
notify the sender and destroy all copies of the transmittal. 

Thank you
University of Chicago Medicine and Biological Sciences 
********************************************************************************

-- 
Manage your subscription for the Freeipa-users mailing list:
https://www.redhat.com/mailman/listinfo/freeipa-users
Go to http://freeipa.org for more info on the project

Reply via email to