I've read draft-tale-dnsop-serve-stale-00. Overall I think we need something like this in practice. Even if, technically, it violates the current protocol standards, the background motivation is a real operational issue and I believe we should provide some standard-compliant mitigation. Of course, the end result may be very different from what's currently described in this draft, but I think this is a good start for the goal.
I have a few minor comments on the current version: - I suspect it should include 'Updates: 1035 (if approved)' in the top boilerplate. - Section 3 If the answer has not been completely determined by the time the client response timer has elapsed, the resolver SHOULD then check its cache to see whether there is expired data that would satisfy the request. If so, it adds that data to the response message and SHOULD set the TTL of each expired record in the message to 1 second. The recommended value of the client response timer is 1.8 seconds, so end clients will see this amount of delay for queries for which this technique is needed (most notably while the corresponding authoritative servers are under a DoS attack and unreachable). I wonder whether this is really acceptable in terms of user experience. According to the draft this implementation has been actually used in the field (correct?). If so, were the end users okay with the delay? Also, it's not clear to me why the TTL is set to 1 second. Since it's actually expired, a zero TTL seems to be a more sensible choice here (a similar feature of unbound uses a zero TTL). If there's a specific reason to avoid 0, it would be better to explain it explicitly. - Section 4 Canonical Name (CNAME) records mingled in the expired cache with other records at the same owner name can cause surprising results. This was observed with an initial implementation in BIND, where a hostname changed from having a CNAME record to an IPv4 Address (A) record. BIND does not evict CNAMEs in the cache when other types are received, which in normal operations is not an issue. However, after both records expired and the authorities became unavailable, the fallback to stale answers returned the older CNAME instead of the newer A. I suspect this is quite specific to internal implementation details of BIND, specifically that RRsets of a name is maintained in a single-linked list, newer RRsets are prepended to the list, and on lookup the last found one is used if the list contains both CNAME and the exact type (A in this example). Is my guess correct? If so, while this is really an interesting topic and probably worth sharing, it's probably better to clarify it's specific to a particular implementation architecture. -- JINMEI, Tatuya _______________________________________________ DNSOP mailing list [email protected] https://www.ietf.org/mailman/listinfo/dnsop
