On Tue, Sep 25, 2012 at 02:08:29PM -0700, Russ Allbery wrote: > Jack Neely <[email protected]> writes: > > > Thanks for reading between the lines. I don't have evidence that my > > KDCs were overloaded, yet I got quite a few cannot reach KDC errors and > > a logins stopped working everywhere. > > > The slaves are HP G7 blades with 12GB of RAM and a 6 core Intel Xeon. 2 > > servers in one DC and the other slave (and master) in the other DC. > > Each DC has its own firewall/vlan for the kerberos servers. RHEL 5 > > running kerb 1.6.1. > > > My network engineers tell me that the firewall in one DC had 8000 > > concurrent connections from the offending IP address to the KDCs and > > 4000 in the second DC. (Oddly, the DC with only 1 slave.) The KDCs > > weren't able to handle other requests until the spike settled. > > Is it possible that, rather than overwhelming the KDC, you instead > overwhelmed the UDP session table on your firewall? Sometimes firewalls > have surprisingly small UDP session tables, which can cause serious > problems for Kerberos and for DNS servers. > > You're right that there are ill-behaved Kerberos applications that will > spam authentication requests, but I tend to think of this similar to the > problem with DNS, where there are ill-behaved resolvers that do the same > thing. Fixing them tends to be really hard, but answering Kerberos > requests should normally be extremely fast. It's usually easier to just > ensure you can handle the load spikes than worry too much about fixing all > the broken clients. (Of course, the rate limiting path that you're going > down is one way to do that.) > > We were quite concerned when we first looked at putting Kerberos KDCs > behind a hardware firewall because of that session limit. Our firewalls > have a 100,000 UDP session limit and a fairly quick timeout. One tuning > that you can do on the hardware firewall, if that is the problem, is to > reduce the UDP session length for Kerberos KDC traffic. You're either > going to get a reply and complete the transaction in under a minute (in > practice, under 10 seconds) or it's never going to work anyway, so if, for > example, your firewall is trying to remember sessions for an hour, you're > just wasting memory and possibly DoSing your firewall.
After spending some quality time with my logs, I do about 1.3 million kerberos requests a day or 960/min on average. The incident that took out the kerberos servers with an additional 600 hits/min (from the krb logs) doesn't even make a spike on my graphs. My late morning usage is higher. So there's another piece to the puzzle. Jack -- Jack Neely <[email protected]> Linux Czar, OIT Campus Linux Services Office of Information Technology, NC State University GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89 ________________________________________________ Kerberos mailing list [email protected] https://mailman.mit.edu/mailman/listinfo/kerberos
