Hello Duane & others, thank you for your response. Comments inline below.
On Thu, 2023-06-29 at 23:58 +0000, Wessels, Duane wrote: > > > > > > ## 2.2 > > > > The first paragraph correctly mentions "policy reasons". The second > > paragraph > > correctly says "they are not authoritative". I am not sure not being > > authoritative can be considered a policy reason, so perhaps these two > > paragraphs can be connected with an "or"? > > I see your point. We propose this change to the introduction sentence: > > A name server returns a message with the RCODE field set to REFUSED > when it refuses to process the query, e.g., for policy or other reasons. Works for me. > > > > > ## 3.1 > > > > "A resolver MUST NOT retry a given query over a server's transport more > > than > > twice" - should this be clarified to say "in a short period of time" or > > something like that? Clearly a retry is allowed *eventually*. > > For reference, here’s the sentence in question at the start of 3.1: > > A resolver MUST NOT retry a given query over a server's transport more > than twice (i.e., three queries in total) before considering the > server's transport unresponsive for that query. > > We feel that “a given query” and “for that query” in the sentence > sufficiently limits the > scope here, and there is no need to qualify it by some amount of time. > > As an example, let’s say that a recursive has been asked to lookup > www.example.com (our “given” query). The example.com zone has two name > servers, each of which has two IP addresses, and (presumably) two transports. > It can send 3 queries to 199.43.135.53 over UDP (then that transport is > unresponsive), 3 queries to 199.43.133.53 over UDP, same over TCP, over IPv6, > and so on. In total the recursive can send 2x2x2x3 = 24 queries before it > has to give up if all servers and all transports are unresponsive. At this > point the resolver gives up on that query and returns SERVFAIL. > > Then, section 3.2 is about caching and says that the resolution failure MUST > be cached for at least 5 seconds, but otherwise gives implementations a lot > of freedom in how to do that. Could be by query tuple, by server/transport, > or some other way. Right! 3.2 solves this. > > Also, "MUST NOT" is pretty strong language. Given the various process > > models of > > resolver implementations, two subprocesses (threads) both retrying the same > > or > > a similar thing a few times can not always be avoided. Would you settle for > > SHOULD NOT? The "given" in "retry a given query" gives some leeway, but not > > enough, I feel. > > We feel that MUST NOT is appropriate but would like more input from working > group > members and implementors especially. Ok > > "may retry a given query over a different transport .. believe .. is > > available" > > - this ignores that some transports have better security properties than > > others. One currently active draft in this area is > > draft-ietf-dprive-unilateral-probing. Perhaps add some wording, without > > being > > too prescriptive, such as "available, and compatible with the resolver's > > security policies, ..". > > We think “compatible with the resolver’s security policies” goes without > saying, but don’t mind making it explicit. I am inclined to agree, and will leave this for others to judge. > > > > ## 3.2 > > > > A previous review > > (https://secure-web.cisco.com/1-uwEOxF71cZbW0W3ux-QNC1pO0bJjYJvc0KHnZ_wN4Xw3M1XWB_K8diPjdzzV1zzAfZ98vObLHcs-9USjQPtEzxOdqnjHtcYGPxv8yID-fDRYNW8i8BtGJL-qahSS-JHbS3LHL6Bfm0duG-nUUKdSZF_MOoDFhQymCFnu838N4-l8Ky7xjoVKijU3pbZHLVQFpxjYecSLm0hqLoc4GW9n2Ri-vYT-lKiSPl5qB72Q1kbSUp21qnHSMMrfCCEizICDfjVzCKrwtau5DkwfiR7PVxgh2wT1twgX8oVBhJIY-0QfTaJLnHg7itWRgwH3tcX/https%3A%2F%2Fmailarchive.ietf.org%2Farch%2Fmsg%2Fdnsop%2FsJlbyhro-4bDhfGBnXhhD5Htcew%2F) > > suggested that the then-chosen tuple was not specific enough, and also said > > it > > was too prescriptive. I agree with both. The current draft prescribes > > nothing, > > which I'm generally a fan of! > > > > However, speaking to a coworker (the one likely responsible for implementing > > this draft, if it turns out our implementation deviates from its final form) > > told me "some guidance would be nice". After some discussion on > > prescriptiveness, here is our suggestion: do not prescribe, but mention > > (without wanting to be complete) a few tuple formats that might make sense, > > and > > suggest that implementations document what they choose here. > > The relevant text here currently says: > > The implementation might cache different resolution failure conditions > differently. For example, DNSSEC validation failures might be cached > according to the queried name, class, and type, whereas unresponsive > servers might be cached only according to the server's IP address. > > So we provide two examples, although not really phrased as “tuples”. I guess > you’re suggesting to see more options here and talk about them more as tuples? Yes, I think that would make sense. > For the documentation suggestion, maybe something like this?: “Developers > SHOULD document their implementation choices so that operators know what > behaviors to expect when resolution failures are cached.” Wonderful. > > > First, we apologize for not realizing that this and two other “for > discussion” questions were not yet resolved. We plan to remove the first > (from the Introduction). > > For the one that was in section 2.6, we propose this updated text and new > section 3.4: > > 2.6. DNSSEC Validation Failures > > For zones that are signed with DNSSEC, a resolution failure can occur > when a security-aware resolver believes it should be able to > establish a chain-of-trust for an RRset but is unable to do so, > possibly after trying multiple authoritative name servers. DNSSEC > validation failures may be due to signature mismatch, missing DNSKEY > RRs, problems with denial-of-existence records, clock skew, or other > reasons. > > Section 4.7 of [RFC4035] already discusses the requirements and > reasons for caching validation failures. Section 3.4 of this > document strengthens those requirements. Good. > 3.4. DNSSEC Validation Failures > > Section 4.7 of [RFC4035] states: > > To prevent such unnecessary DNS traffic, security-aware resolvers MAY > cache data with invalid signatures, with some restrictions. > > This document updates [RFC4035] with the following, stronger > requirement: > > To prevent such unnecessary DNS traffic, security-aware resolvers > MUST cache DNSSEC validation failures, with some restrictions. Good :) > And for the one in section 3.3 we propose this: > > 3.3. Requerying Delegation Information > > Section 2.1 of [RFC4697] identifies circumstances in which "every > name server in a zone's NS RRSet is unreachable (e.g., during a > network outage), unavailable (e.g., the name server process is not > running on the server host), or misconfigured (e.g., the name server > is not authoritative for the given zone, also known as 'lame')." It > prohibits unnecessary "aggressive requerying" to the parent of a non- > responsive zone by sending NS queries. > > The problem of aggresive requerying to parent zones is not limited to > queries of type NS. This document updates the requirement from > section 2.1.1 of [RFC4697] to apply more generally: Upon encountering > a zone whose name servers are all non-responsive, a resolver MUST > cache the resolution failure. Furthermore, the resolver MUST limit > queries to the non-responsive zone's parent zone (and other ancestor > zones) just as it would limit subsequent queries to the non- > responsive zone. Looks great. Thanks! Kind regards, -- Peter van Dijk PowerDNS.COM BV - https://www.powerdns.com/ _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop