Re: [DNSOP] I-D Action: draft-ietf-dnsop-caching-resolution-failures-00.txt

Petr Špaček Thu, 28 Jul 2022 02:07:08 -0700

On 27. 07. 22 19:42, internet-dra...@ietf.org wrote:

A New Internet-Draft is available from the on-line Internet-Drafts directories.
This draft is a work item of the Domain Name System Operations WG of the IETF.


         Title           : Negative Caching of DNS Resolution Failures
         Authors         : Duane Wessels
                           William Carroll
                           Matthew Thomas
   Filename        : draft-ietf-dnsop-caching-resolution-failures-00.txt

I think this is an important clarification to the protocol and we shouldadopt it and work on it.


I like the document up until end of section 2.

After that I have reservations about the specific proposals put forth inthe section 3.

I hope this will kick off discussion, please don't take pointspersonally. I'm questioning the technical aspects.

3.  DNS Negative Caching Requirements

3.1.  Retries and Timeouts

   A resolver MUST NOT retry more than twice (i.e., three queries in
   total) before considering a server unresponsive.

   This document does not place any requirements on timeout values,
   which may be implementation- or configuration-dependent.  It is
   generally expected that typical timeout values range from 3 to 30
   seconds.


I'm curious about reasoning about this.

My motivation:

Random drop or temporarily saturated/malfunctioning link should notcause resolver to fail for several seconds. As an extreme case, think ofvalidating resolver on a laptop forwarding elsewhere. Should really twopacket drops cause it to servfail for several seconds?


Related to this, I have a principal objection:

IMHO we should NOT be inventing flow control from scratch ourselves. Onthe contrary - we should be borrowing prior art from existing flowcontrol algorithms and adapt them if necessary.

3.2.  TTLs

   Resolvers MUST cache resolution failures for at least 5 seconds.
   Resolvers SHOULD employ an exponential backoff algorithm to increase
   the amount of time for subsequent resolution failures.  For example,
   the initial TTL for negatively caching a resolution failure is set to
   5 seconds.  The TTL is doubled after each retry that results in
   another resolution failure.  Consistent with [RFC2308], resolution
   failures MUST NOT be cached for longer than 5 minutes.


My motivation: Rapid recovery.

Why 5 seconds? Why not 1? Or why not 0.5 s? ... I would like to seereasoning behind specific numbers.

IMHO most problems is caused by unlimited retries and as soon as _a_limit is in place the problem is alleviated, and with exponentialbackoff we should be able to start small. I'm not sure that a specificnumber should be mandated.

3.3.  Scope

   Resolution failures MUST be cached against the specific query tuple
   <query name, type, class, server IP address>.

Why this tuple was selected? Why not <class, zone, server IP> for, say,timeouts? Or why not <server IP> for timeouts?

What about transport protocol and its parameters? (TCP, UDP, DoT...) etc.

My motivation:
- Simplify cache management.

- Imagine an attacker attempting to misuse this new cache. The cache hasto be bounded in size. It has to somehow manage overflow etc.

Generally I think this MUST is too prescriptive. It should allow forless specific caching if an implementation decides it is fit for a giventype of failure and configuration, or depending on operational conditions.


--
Petr Špaček

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] I-D Action: draft-ietf-dnsop-caching-resolution-failures-00.txt

Reply via email to