We run Exim4 and use djbdns as our local DNS cache. All is generally 
well and messages flow through our system in under a second.

But sometimes the upstream spam RBLs we rely on seem to "disappear", 
probably suffering a DDoS attack. In those cases, Exim4 message 
processing grinds to a crawl, taking over 30 seconds per message. What's 
happening is that the DNS lookups for the disappeared RBL are timing out...

Whom to blame? Should our local DNS cache somehow remember an upstream 
timeout so that it can return something (what?) immediately? I'm not 
sure that it could do this. After all, a DNS cache is supposed to 
somewhat transparently return the same information as would be returned 
by the upstream server. But if lookups to the upstream server are timing 
out, then how should the local cache exhibit that same fact? Presumably 
it handles it correctly by itself also not responding to a query.

Thus I wonder: should Exim4 somehow have a limited built-in DNS cache 
that at least caches those DNS queries that result in a timeout? Similar 
to the callout database and the retry database, maybe Exim4 needs to 
keep a database of timed-out DNS queries?


Has anyone run into a similar problem? Found workarounds? Solutions?

I suspect that when the DNS system was designed, no one thought about 
DDoS attacks, or else they might have created both a SERVFAIL and an 
UPSTREAMSERVFAIL response (thus giving a cache a way of immediately 
informing a client that a server has failed, but it isn't the cache 
itself who has failed!).

Similarly, when the Exim design decision was made that Exim itself would 
not cache DNS stuff, instead relying on a local DNS cache for that, RBLs 
and DDoS attacks were probably not on the radar screen.

Now that we're in this even newer brave new world, how best to proceed?

Alexander Perlis


-- 
## List details at http://lists.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

Reply via email to