On 07/26/2012 09:37 AM, Sigbjorn Lie wrote:
> On 07/26/2012 02:53 PM, Rob Crittenden wrote:
>> Sigbjorn Lie wrote:
>>> On Wed, July 25, 2012 09:54, Sigbjorn Lie wrote:
>>>> On Tue, July 24, 2012 20:29, Simo Sorce wrote:
>>>>> On Tue, 2012-07-24 at 10:22 +0200, Sigbjorn Lie wrote:
>>>>>> Hi,
>>>>>> I keep seing this error message in our production environment
>>>>>> "Request is a replay" in
>>>>>> variuos services using kerberos like ssh, sssd, automounter,
>>>>>> squid +++ after the upgrade to
>>>>>> RHEL 6.3 /
>>>>>> IPA
>>>>>> 2.2.
>>>>>> Jul 24 10:16:11 server027 sssd_be: GSSAPI Error: Unspecified GSS
>>>>>> failure.  Minor code may
>>>>>> provide more information (Request is a replay)
>>>>>> Seaching google seem to suggest that this is an error with time.
>>>>>> However we have NTP
>>>>>> configured (IPA servers as NTP servers) which is synchronized to
>>>>>> external NTP servers. There
>>>>>> has been no issue before, and I cannot find issue with the time
>>>>>> being out of sync on the
>>>>>> machines where this is happening.
>>>>> This error usually appears only when a same request is found in the
>>>>> replay cache. It shouldn't be related to time issues, in that case
>>>>> you usually get clock-skew.
>>>>> Can you tell me what operation was being performed by sssd when you
>>>>> caught that error ? Can you check if immediately before another
>>>>> identical operation had been
>>>>> performed ?
>>>> That being said, I do have 1 IPA server (out of 3) that has
>>>> significantly higher CPU usage than
>>>> the other 2, the 15-minute load average is sitting at between 0.85
>>>> and 0.95 the entire day, where
>>>> ns-slapd 389-ds process is running at 100% most of the time.
>>>> Load: 1.02, 0.94, 0.87
>>>> In comparison the other two IPA servers has a 15-minute average
>>>> between 0.10 - 0.30 throughout
>>>> the day, and the ns-slapd process is far from being such a cpu hog.
>>>> On the server having high load, running even a command such as
>>>> "ipactl status" can take up to 20
>>>> seconds to complete, where "Directory Service: RUNNING" returns
>>>> after a second or so, and to list
>>>> the rest of the services takes the remainding 19 seconds.
>>>> Also the web interface on this particular IPA server is rendered
>>>> unusable, returning "Limits
>>>> exceeded for the query" for almost any action.
>>>> Restarting all the IPA servies (ipactl restart) on the problematic
>>>> host soemwhat improves the
>>>> situation, however that particular server returns to having heavy
>>>> load quickly.
>>>> Using logconv.pl to analyze the dirsrv access log file displays
>>>> that the server in question has
>>>> the lowest search queries per min with 106 queries/min. The other
>>>> servers have 710 search
>>>> queries/sec and 168 queries/sec.
>>>> For modifications all the IPA servers has about 5-6 queries/sec.
>>>> For unindexed searches the
>>>> problematic server is the server with the lowest number. It does
>>>> however have more than twice the
>>>> amount of GSSAPI binds than the other servers with over 61000
>>>> GSSAPI binds over a 17 hour period.
>>>> The problematic server is a physical server with 2 x AMD 2.4GHz
>>>> Quad core CPU and 8GB of RAM.
>>>> This issue is also impacting all the clients, where I see random
>>>> hangs with anything involving a
>>>> ldap or kerberos query to the IPA servers.
>>>> Any suggestions?
>>> Anyone ?
>>> I am starting to see the Replay error when using the "ipa" CLI tool
>>> as well, causing the request
>>> to drop out in an error.
>>> ipa dnsrecord-show example.com hostname
>>> ipa: ERROR: Local error: SASL(-1): generic failure: GSSAPI Error:
>>> Unspecified GSS failure.  Minor
>>> code may provide more information (Request is a replay)
>> Sorry, I had started a reply yesterday and got side-tracked and never
>> sent it.
> I know that feeling. :)
>> For the one server is busier than others, how are your clients
>> configured? Are you using DNS SRV records?
> We use DNS SRV records for everything LDAP that does support it ->
> SSSD and Linux automounter. Solaris clients, Red Hat 5 using nss_ldap,
> and NetApp use statically configured machines, however this is the
> second server in the server list for these machines. The primary
> server got more than 7x more LDAP queries per minute, and the load on
> the primary is much, much lower. All kerberos clients are using DNS
> SRV for lookups, no static configuration there.
> I see some hickups on the clients as well, when browsing nfs shares
> (looking up UIDs), unlocking a client etc. It would seem like these
> are related to the "faulty" IPA server with high load, as it seem to
> respond very slowly to a lot of ldap queries too. I have tried
> removing it from the DNS SRV records an hour ago, and things seem to
> run smoother. A few services are still looking up there though, and
> the load on the "faulty" server is still high even with fewer clients.
> The primary server that's now receiving most of the queries barely
> increased anything at all in CPU usage.
>> For the replay, are your servers running in bare metal or in VMs? How
>> about the clients? This sure seems like a time issue.
> The time is configured as it has been for a long time. The physical
> IPA servers are syncronized from external time sources, providing the
> rest of the network with time. We have 2 physical servers and 1
> virtual server. I have looked into the time, and it does seem like
> everything is syncronized.
> The amount of clients has not changed much over the last few months.
> These issues started appearing just after the upgrade to RHEL 6.3 /
> IPA 2.2.
> Any suggestions to where to continue the troubleshooting?
Was this issue ever resolved?

> Regards,
> Siggi
> _______________________________________________
> Freeipa-users mailing list
> Freeipa-users@redhat.com
> https://www.redhat.com/mailman/listinfo/freeipa-users

Thank you,
Dmitri Pal

Sr. Engineering Manager for IdM portfolio
Red Hat Inc.

Looking to carve out IT costs?

Freeipa-users mailing list

Reply via email to