On 27. 07. 22 19:42, internet-dra...@ietf.org wrote:
A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Domain Name System Operations WG of the IETF.
Title : Negative Caching of DNS Resolution Failures
Authors : Duane Wessels
William Carroll
Matthew Thomas
Filename : draft-ietf-dnsop-caching-resolution-failures-00.txt
I think this is an important clarification to the protocol and we should
adopt it and work on it.
I like the document up until end of section 2.
After that I have reservations about the specific proposals put forth in
the section 3.
I hope this will kick off discussion, please don't take points
personally. I'm questioning the technical aspects.
3. DNS Negative Caching Requirements
3.1. Retries and Timeouts
A resolver MUST NOT retry more than twice (i.e., three queries in
total) before considering a server unresponsive.
This document does not place any requirements on timeout values,
which may be implementation- or configuration-dependent. It is
generally expected that typical timeout values range from 3 to 30
seconds.
I'm curious about reasoning about this.
My motivation:
Random drop or temporarily saturated/malfunctioning link should not
cause resolver to fail for several seconds. As an extreme case, think of
validating resolver on a laptop forwarding elsewhere. Should really two
packet drops cause it to servfail for several seconds?
Related to this, I have a principal objection:
IMHO we should NOT be inventing flow control from scratch ourselves. On
the contrary - we should be borrowing prior art from existing flow
control algorithms and adapt them if necessary.
3.2. TTLs
Resolvers MUST cache resolution failures for at least 5 seconds.
Resolvers SHOULD employ an exponential backoff algorithm to increase
the amount of time for subsequent resolution failures. For example,
the initial TTL for negatively caching a resolution failure is set to
5 seconds. The TTL is doubled after each retry that results in
another resolution failure. Consistent with [RFC2308], resolution
failures MUST NOT be cached for longer than 5 minutes.
My motivation: Rapid recovery.
Why 5 seconds? Why not 1? Or why not 0.5 s? ... I would like to see
reasoning behind specific numbers.
IMHO most problems is caused by unlimited retries and as soon as _a_
limit is in place the problem is alleviated, and with exponential
backoff we should be able to start small. I'm not sure that a specific
number should be mandated.
3.3. Scope
Resolution failures MUST be cached against the specific query tuple
<query name, type, class, server IP address>.
Why this tuple was selected? Why not <class, zone, server IP> for, say,
timeouts? Or why not <server IP> for timeouts?
What about transport protocol and its parameters? (TCP, UDP, DoT...) etc.
My motivation:
- Simplify cache management.
- Imagine an attacker attempting to misuse this new cache. The cache has
to be bounded in size. It has to somehow manage overflow etc.
Generally I think this MUST is too prescriptive. It should allow for
less specific caching if an implementation decides it is fit for a given
type of failure and configuration, or depending on operational conditions.
--
Petr Špaček
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop