Chris,

We didn't actually see any problems either until the KDC was under heavy load. 
The unpatched version of 1.9.1 was and still is running on our secondary KDC 
without issue, and we had been using 1.9.1 in testing and development for 
months without issue as well. During the period where we saw the performance 
degradation, the primary KDC handled 467000 distinct AS/TGS requests. Which 
means the KDC was handling roughly 43 requests per second (not counting lots of 
retransmits). That is typical of our primary production KDC's workload 
throughout the day, but we don't have any other KDC that gets that amount of 
traffic; by contrast, our secondary KDC gets a request once or twice a minute. 
So it would seem the performance problem only really comes into play when the 
KDC is under heavy load. 

Jonathan

On Aug 9, 2011, at 4:23 AM, Chris Hecker wrote:

> 
> Just another data point:  I'm not seeing this on my locally built (but 
> not with the attached patch) 1.9.1:
> 
> real    0m41.409s
> user    0m3.358s
> sys     0m3.683s
> finished round 1
> 
> real    0m35.036s
> user    0m3.441s
> sys     0m3.658s
> finished round 2
> 
> real    0m44.344s
> user    0m3.363s
> sys     0m3.728s
> finished round 3
> 
> real    0m40.930s
> user    0m3.465s
> sys     0m3.973s
> finished round 4
> 
> I had to reduce the number of inner iterations to 300 because my machine 
> is slow.  The variance in the above numbers is because there's a bunch 
> of stuff running on this machine.
> 
> Chris
> 
> On 2011/08/08 11:21, Greg Hudson wrote:
>> On Mon, 2011-08-08 at 11:22 -0400, Jonathan Reams wrote:
>>> I did some performance testing on our test KDC and was able to
>>> reproduce the performance issue with 1.9.1.
>> 
>> I found a regression which would affect these tests, but I'm not sure it
>> accounts for your global performance issues.
>> 
>> The KDC in krb5 1.9 isn't supposed to be using an on-disk replay cache,
>> but due to a bug, it is actually opening and reading a replay cache for
>> every TGS request, which is significantly less efficient than the 1.8
>> behavior (using a replay cache which stays open for the lifetime of the
>> KDC).
>> 
>> In a test which runs in under five minutes, this regression produces
>> visible O(n^2) performance characteristics.  This would not necessarily
>> account for performance degradation over hours, as the performance drag
>> of the replay cache should become stable after five minutes.  It's
>> possible that the constant drag was enough to cause the KDC to fall
>> behind on the request load, but it's also possible that there's a second
>> problem which isn't so easily reproduced.
>> 
>> I've attached a patch.  Note that there is a second, in-memory
>> "lookaside" cache with O(n^2) performance characteristics in the short
>> term, which holds queries for up to two minutes.  You may see a slight
>> degradation in performance in test cases due to this.  You can
>> temporarily rebuild the kdc directory with "make clean;
>> CPPFLAGS=-DNOCACHE" if you want to remove this variable from your
>> performance tests.
>> 
>> 
>> 
>> 
>> ________________________________________________
>> Kerberos mailing list           [email protected]
>> https://mailman.mit.edu/mailman/listinfo/kerberos
> ________________________________________________
> Kerberos mailing list           [email protected]
> https://mailman.mit.edu/mailman/listinfo/kerberos
> 


________________________________________________
Kerberos mailing list           [email protected]
https://mailman.mit.edu/mailman/listinfo/kerberos

Reply via email to