On Thu, Aug 21, 2025 at 06:05:37PM +0000, L. Mark Stone via mailop wrote: > IOW, if you are doing just one lookup a day (86400), and the average > lookup failure rate is say 5%, then you'll have a delivery issue once > every 20 days.
One should keep in mind that many resolvers cap their cache TTLs at well under a day, for various good reasons. So an authoritative TTL of 1 day will often not translate to cache TTLs of 1 day. > But with a TTL of 300, you can reasonably expect to have multiple > delivery issues every day, given that same 5% error rate. That 5% error rate is well outside the norm. With a deliberately complex workload of 40,000 MX queries for randomly chosen DNSSEC-signed (DS RRs present at parent) domains, queried at a concurrency of 400 outstanding requests at a time, my unbound resolver returned: 39272 ; NOERROR qr rd ra ad 716 ; SERVFAIL qr rd ra 8 ; NOERROR qr rd ra 4 ; NXDOMAIN qr rd ra Which is an error rate of 716/40000 or 1.79%. However, some of the domains in question are parked and just broken all the time, and rerunning the same queries two more times yields: 39280 ; NOERROR qr rd ra ad 705 ; SERVFAIL qr rd ra 8 ; NOERROR qr rd ra 4 ; NXDOMAIN qr rd ra 3 ; RetryLimitExceeded 39283 ; NOERROR qr rd ra ad 704 ; SERVFAIL qr rd ra 8 ; NOERROR qr rd ra 4 ; NXDOMAIN qr rd ra 1 ; RetryLimitExceeded So only 11 of the domains that failed initially are resolvable when retried. That's an error rate under 0.03%. And when MX lookups fail, the message is deferred, and typically retried in an hour or less. Therefore, and especially for email, given that SMTP deliveries are queued and retried, I don't see a compelling reason for long TTLs. Anything over 60s is sufficient to amortise lookup latency at high volume, and at low volume even an extra couple of seconds in email delivery latency (when the auth servers are halfway around the world) is unlikely to be a concern. -- Viktor. _______________________________________________ mailop mailing list mailop@mailop.org https://list.mailop.org/listinfo/mailop