Re: [Freeipa-users] 'Request is a replay'

Sigbjorn Lie Thu, 26 Jul 2012 06:46:52 -0700

On 07/26/2012 02:53 PM, Rob Crittenden wrote:

Sigbjorn Lie wrote:
On Wed, July 25, 2012 09:54, Sigbjorn Lie wrote:
On Tue, July 24, 2012 20:29, Simo Sorce wrote:
On Tue, 2012-07-24 at 10:22 +0200, Sigbjorn Lie wrote:
Hi,
I keep seing this error message in our production environment"Request is a replay" invariuos services using kerberos like ssh, sssd, automounter, squid+++ after the upgrade to
RHEL 6.3 /
IPA
2.2.
Jul 24 10:16:11 server027 sssd_be: GSSAPI Error: Unspecified GSSfailure. Minor code may
provide more information (Request is a replay)
Seaching google seem to suggest that this is an error with time.However we have NTPconfigured (IPA servers as NTP servers) which is synchronized toexternal NTP servers. Therehas been no issue before, and I cannot find issue with the timebeing out of sync on the
machines where this is happening.
This error usually appears only when a same request is found in the
replay cache. It shouldn't be related to time issues, in that caseyou usually get clock-skew.
Can you tell me what operation was being performed by sssd when you
caught that error ? Can you check if immediately before anotheridentical operation had been
performed ?
That being said, I do have 1 IPA server (out of 3) that hassignificantly higher CPU usage thanthe other 2, the 15-minute load average is sitting at between 0.85and 0.95 the entire day, where
ns-slapd 389-ds process is running at 100% most of the time.

Load: 1.02, 0.94, 0.87
In comparison the other two IPA servers has a 15-minute averagebetween 0.10 - 0.30 throughout
the day, and the ns-slapd process is far from being such a cpu hog.
On the server having high load, running even a command such as"ipactl status" can take up to 20seconds to complete, where "Directory Service: RUNNING" returnsafter a second or so, and to list
the rest of the services takes the remainding 19 seconds.
Also the web interface on this particular IPA server is renderedunusable, returning "Limits
exceeded for the query" for almost any action.
Restarting all the IPA servies (ipactl restart) on the problematichost soemwhat improves thesituation, however that particular server returns to having heavyload quickly.
Using logconv.pl to analyze the dirsrv access log file displays thatthe server in question hasthe lowest search queries per min with 106 queries/min. The otherservers have 710 search
queries/sec and 168 queries/sec.
For modifications all the IPA servers has about 5-6 queries/sec. Forunindexed searches theproblematic server is the server with the lowest number. It doeshowever have more than twice theamount of GSSAPI binds than the other servers with over 61000 GSSAPIbinds over a 17 hour period.
The problematic server is a physical server with 2 x AMD 2.4GHz Quadcore CPU and 8GB of RAM.
This issue is also impacting all the clients, where I see randomhangs with anything involving a
ldap or kerberos query to the IPA servers.

Any suggestions?
Anyone ?
I am starting to see the Replay error when using the "ipa" CLI toolas well, causing the request
to drop out in an error.

ipa dnsrecord-show example.com hostname
ipa: ERROR: Local error: SASL(-1): generic failure: GSSAPI Error:Unspecified GSS failure. Minor
code may provide more information (Request is a replay)
Sorry, I had started a reply yesterday and got side-tracked and neversent it.

I know that feeling. :)

For the one server is busier than others, how are your clientsconfigured? Are you using DNS SRV records?

We use DNS SRV records for everything LDAP that does support it -> SSSDand Linux automounter. Solaris clients, Red Hat 5 using nss_ldap, andNetApp use statically configured machines, however this is the secondserver in the server list for these machines. The primary server gotmore than 7x more LDAP queries per minute, and the load on the primaryis much, much lower. All kerberos clients are using DNS SRV for lookups,no static configuration there.

I see some hickups on the clients as well, when browsing nfs shares(looking up UIDs), unlocking a client etc. It would seem like these arerelated to the "faulty" IPA server with high load, as it seem to respondvery slowly to a lot of ldap queries too. I have tried removing it fromthe DNS SRV records an hour ago, and things seem to run smoother. A fewservices are still looking up there though, and the load on the "faulty"server is still high even with fewer clients. The primary server that'snow receiving most of the queries barely increased anything at all inCPU usage.

For the replay, are your servers running in bare metal or in VMs? Howabout the clients? This sure seems like a time issue.

The time is configured as it has been for a long time. The physical IPAservers are syncronized from external time sources, providing the restof the network with time. We have 2 physical servers and 1 virtualserver. I have looked into the time, and it does seem like everything issyncronized.


The amount of clients has not changed much over the last few months.

These issues started appearing just after the upgrade to RHEL 6.3 / IPA 2.2.

Any suggestions to where to continue the troubleshooting?



Regards,
Siggi

_______________________________________________
Freeipa-users mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/freeipa-users

Re: [Freeipa-users] 'Request is a replay'

Reply via email to