At Mon, 11 Aug 2008 11:13:13 -0500, Walter Gould <[EMAIL PROTECTED]> wrote:
> Here's the latest in my DNS horror story... [snip] > I started named and watched 'rndc status'. After a few minutes this was > the result: > # rndc status > version: 9.5.0-P2 () > number of zones: 365 > debug level: 0 > xfers running: 0 > xfers deferred: 0 > soa queries in progress: 0 > query logging is ON > recursive clients: 4980/9900/10000 > tcp clients: 0/100 > server is up and running > > When the recursive clients reached this level (or shortly before) > queries started timing out... > > /var/log/messages showed the all to familiar too many open sockets error: > > Aug 11 10:34:17 dnsnew named[24266]: error: socket: too many open file > descriptors > Aug 11 10:34:31 dnsnew last message repeated 1876 times > > My questions are - > 1. Do you think I should increase the FDSETSIZE to 10,000 or some other > crazily high number? > 2. Is that excessive? You could try this, and it *might* help, but if the server is handling such a high number of recursive clients regularly, I'm afraid it will just trigger another scalability problem. > 3. What other adverse effects might this cause on my server? In general, allowing a large number of concurrent open sockets will make the server busier, roughly in proportion to the number of sockets. > 4. Am I the only one having problems with a) ISC patched BIND packages > and b) Red Hat patched BIND rpms? I don't know the answer to this question, but your operational environment seems to be extraordinary in some points: - it's acting both as an authoritative and as a caching server - as an authoritative server, it's managing a pretty large number of zones (which may require resource-consuming operations such as zone transfers) - as a caching server, it seems to be handling a high volume of queries (several thousands concurrent clients) While we've worked hard on P2 to make it as scalable as possible while keeping it as conservative as possible, this environment may just exceed the ability of the conservative implementation. I know operators don't like a radical solution, but I'd really like you to give beta version a try. At least the next beta versions (which will hopefully be released later this week or early next week) should be much stable than the currently available ones, and should not be as "radical" as you might think. --- JINMEI, Tatuya Internet Systems Consortium, Inc.
