Loganaden Velvindron <[email protected]> writes:
Hi Loganaden,
Thanks for the comments about the EDE draft. I've marked up your
comments with responses and actions below. Let us know if you have any
questions.
11 Loganaden Velvindron
==================================================
11.1 NOCHANGE pass-through
~~~~~~~~~~~~~~~~~~~~~~~~~~
1) I see at least one more model that needs to be supported, which is
how to handle edns extended codes that are generated by a remote
server, i.e. passthrough. Layering multiple forwarding resolvers
behind each other is common, and some way to notify the end user that
the originating message was not generated by the first resolver would
be important. I don't know if there needs to be some way to indicate
how "deep" the error was away from the end user; it seems just two
levels (locally generated or non-locally generated) would be
sufficient with only minor thought on it.
Re: 1) This is a good point, but implementation will likely run afoul
of existing standards or else require duplicative response codes or
use of an additional flag in the INFO-CODES section. Perhaps a new
flag type, similar to AA, which can be used to say that this recursor
will return this result reliably/deterministically. Attempting to
provide depth is perhaps unlikely, but flags for
stub/forwarder/recursive/intermediate recursive or a subset of those
might make sense. Perhaps a non-descript flag such as 'DR' for
Deterministic Response. Obviously INFO-CODES can support many
different flags, of which IR (Intermediate Resolver) or such could be
included at the point of response generation, with the last server
providing actual data in the chain being the one to authoritatively
set the flag, which then must not be modified by further downstream
resolvers in the process of returning the response.
+ Response: this has been discussed a few times, and the current view
(that at least I hold, and likely others based on past discussions)
is that it would be best to get this out as is, without a
pass-through model while we deploy it and get operational experience
with its use. Pass-through is complex for a bunch of reasons (NAT
alone, eg), and it's unclear we can come up with a solution for all
the likely corner cases to appear.
TL;DR: we should definitely work on it, but in the future.
11.2 DONE network error code needed beyond timeout
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) SERVFAIL needs another error code to indicate the difference
between a network error (unexpected network response like ICMP, or TCP
error such as connection refused) versus timeout of the remote auth
server, as that is often a confusing issue.
+ Response: looks like a reasonable idea, so it has been added to the
latest draft. thank you!
Re: 2) Specifics as an item in the below list.
11.3 NOCHANGE
~~~~~~~~~~~~~~
1) Really, I'd like to see a definition of some of the EXTRA TEXT
strings here, since that will be almost immediately an issue that
would need to be sorted out before this could be useful. There have
been some discussions (sorry, don't know if it's a draft or just
talking) about browsers consuming "extra" data in DNS responses that
can do a number of things. As an example that is important to Quad9
(or any blocking-based DNS service) it might be the case that upon
receiving a request for a "blocked" qname/qtype, we would hand back a
forged answer that leads to a splash page as the default result.
However, if the request was made from a resolver stack that had the
EDNS extensions, we might include the "real" result in the EXTRA TEXT
field, as well as a URL that points the user to an explanation of why
that particular qname/qtype was blocked. Or we might add a risk
factor, or type of risk ("risk=100, risktype=phishing") or the like.
This allows a single query to be digestable by "dumb" stacks that we
want to have do the most safe thing, but also allow "smart" resolver
stacks to present a set of options to the end user.
+ Again, I suspect that the complexity associated with standardizing
on exactly a structure (including internationalization) of
extra-information in a machine understandable and parsable mechanism
is fraught with a very long discussion period. It might be worthy
of future work, and I certainly think it would be valuable, but
(IMHO) it would be better to get this out and work on that as a
follow-on project *if* we could achieve consensus on it (which, I'll
be honesty, will be either difficult or take a long time or both).
Re: 3) Seems reasonable.
11.4 NOCHANGE blacked/censored/retry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) I'm confused as to why a "blocked" or "censored" result would have
a retry as mandatory. The resolver gave a canonical answer from the
point of policy.
+ the retry flag is now gone.
Re: 4) See below notes.
Potential inclusions/Adjustments:
11.5 NOCHANGE More retry case thoughts
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4.1.3.1: A use case exists where a stale answer should attempt a
retry. A declarative setting for the Retry bit should not be specified
here, but instead guidance on whether or not the R bit should be set
should be included. For example, when using a front-end load balancer,
if the recursive backends are temporarily inaccessible but are
expected to recover in time to handle a subsequent query, it would be
prudent to include the R bit. No additional load would be generated
towards the Authoritatives in this case, and the Intermediate Recursor
may choose to set the R bit or not based on whether the failure mode
appears to be temporary.
4.1.5: Another area where guidance should be provided. Some recursive
resolvers process requests out of order, asynchronously, or will retry
alternative authoritatives post-processing as part of infrastructure
table management and thus may response to a subsequent query, where
the initial will fail, likely due to timeouts. In our specific case,
due to our use of multiple recursive backend technologies, a
subsequent query failing DNSSEC validation has a significant chance of
being answered by an alternative recursor. See also 4.2.1.
4.2.11: SERVFAIL - Network: The SERVFAIL response is being generated
due to what is clearly identifiable to the answering server as a
network issue. R bit should be set.
4.4.3: Abusive: The answering system considers the query in question
to be abusive for reasons other than load, indicating that the
specific requests are undesired. This could provide hints to Network
Operators or simply poorly configured client implementations that the
specific queries may be part of an amplification or other attack and
should be inspected.
4.4.4: Excessive: The answering system considers the query volume of
the client to be excessive, indicating that it is the volume and not
the content of the queries being refused and that it may be willing to
answer if volume is reduced. This could provide hints to Network
Operators or poorly configured client systems that they need to add
additional endpoints or reduce their request volume to restore
service.
4.4.5: Go Away: The answering system considers further queries from
the client/network to have to exceeded thresholds by large margins or
excessive durations, and further queries are likely to be dropped.
This message is an attempt to limit the continued use of resources
terminating queries which will not be answered. This may simply be a
sub-case of Abusive/Excessive, but also is not intended to be sent for
each query, but instead only intermittently, and to bypass the need
for lengthy troubleshooting efforts when drop rules cause a recursor
to seem to have vanished.
4.5.1: The R flag being set here implies that there are potentially
multiple policies in use and that a retry might receive an answer -
which should not be the case with a single intermediate recursive
service. A client, knowing that it has multiple recursive services
with differring policies might retry against a different recursive
service (ex: 8.8.8.8 instead of 9.9.9.9), but this effectively defeats
the policies of the initial recursor, rendering it ineffective. The
use of a specific server as a delineation is also confusing - it
should instead specify that the answering entity - be it a single
server or larger entity, has blocked this response. Also, blocked
should be further defined to avoid collision with the definition of
the Censored response code. Blocked in this case would be used as a
catch-all for anything not otherwise categorized.
4.5.2: See 4.5.1. Censoring is inherently a governmental action and
this should be reserved for that due to the severity and legal
repercussions of attempts to bypass. R bits should not be set.
Censored should be defined in the document to avoid confusion.
4.5.3: Filtered: Differentiated from Blocked/Censored in that this
content has been specifically redacted at the perceived behest of the
client - may include ad-blockers, dnsbl, or other specific cases -
intended to be used by those systems. Would potentially include
corporate IT policies.
4.5.4: Malicious: Differentiated from Blocked and Filtered in that the
answering server believes the response to be actively malicious and
harmful to the requesting systems or applications, and not merely
undesired or offensive. R bits should not be set.
4.5.5: Malicious Upstream - The upstream entity is considered
malicious by the answering server and thus a refusal to respond has
been returned. Details should be included within the INFO-CODE and
potentially EXTRA-TEXT. This is differentiated from Malicious in that
in this case, it is the actual upstream server that is having all
responses blocked, not the content itself - for instance a revoked or
unexpected certificate (such as due to a CAA record) - from which no
responses will be accepted. The R bit being set here depends on
whether the server believes that the specific path is compromised - if
all authoritatives are failed, then a retry will not help. If only one
is, then it will help to get to the non-compromised server. In the
absence of data, the R bit should be set.
It may make sense to create an extension of the R bit, via additional
flag or other field which adds additional context to the retry
declaration, such as that the request should retry the same recursor,
or should instead immediately move to and try the next available.
11.6 TODO synthesized == forged
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4.1.6: Synthesized Answer: This response could be considered a
sub-case of forged. An example of this would be the id.server or
version.bind queries, they cannot be considered forged, but also no
authority truly holds them.
+ Response: I think this is worthy of further thought and I'd love to
hear opinions from others. IMHO, I'm not sure we should get into
micro-error coding. I would say forged, in your examples, still
fits. But there are other cases where I think synthesized may make
sense. Anyone else have thoughts?
11.7 NOCHANGE finish categorizing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Other Notes: INFO-CODE: It would seem that would be best to include a
basic recommendation for a standard DNS-specific RWhois/CRL-like
endpoint which could provide local (non-IANA) information about
returned codes, potentially at a well-known URI, or even within the
DNS itself via TXT records or even within the EXTRA-TEXT field itself.
+ Response: per discussions with others too, which you've hopefully
read, there is a lot of desire for ways to potentially standardize
supplemental information within the EXTRA-TEXT field. However, for
the time being the goal is to get this out and get experience with
how it is used and potentially standardize on the addition of
machine readable supplemental information (URLs being the other
common suggestion). Publishing this first (as is) doesn't get in
the way of a future RFCs extending this specification.
--
Wes Hardaker
USC/ISI
_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop