We run Exim4 and use djbdns as our local DNS cache. All is generally well and messages flow through our system in under a second.
But sometimes the upstream spam RBLs we rely on seem to "disappear", probably suffering a DDoS attack. In those cases, Exim4 message processing grinds to a crawl, taking over 30 seconds per message. What's happening is that the DNS lookups for the disappeared RBL are timing out... Whom to blame? Should our local DNS cache somehow remember an upstream timeout so that it can return something (what?) immediately? I'm not sure that it could do this. After all, a DNS cache is supposed to somewhat transparently return the same information as would be returned by the upstream server. But if lookups to the upstream server are timing out, then how should the local cache exhibit that same fact? Presumably it handles it correctly by itself also not responding to a query. Thus I wonder: should Exim4 somehow have a limited built-in DNS cache that at least caches those DNS queries that result in a timeout? Similar to the callout database and the retry database, maybe Exim4 needs to keep a database of timed-out DNS queries? Has anyone run into a similar problem? Found workarounds? Solutions? I suspect that when the DNS system was designed, no one thought about DDoS attacks, or else they might have created both a SERVFAIL and an UPSTREAMSERVFAIL response (thus giving a cache a way of immediately informing a client that a server has failed, but it isn't the cache itself who has failed!). Similarly, when the Exim design decision was made that Exim itself would not cache DNS stuff, instead relying on a local DNS cache for that, RBLs and DDoS attacks were probably not on the radar screen. Now that we're in this even newer brave new world, how best to proceed? Alexander Perlis -- ## List details at http://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
