On Tue, 5 Mar 2019, Dave Lawrence wrote:
I can sort of see how someone might infer from "It is predicated on
the observation that authoritative server unavailability can cause
outages ..." that it means this whole idea is constrained to DDoS, and
presumably you would include as well other network and server outages
not caused by DDoS. It doesn't only mean that though. The intention
is that this applies to any inability to get a proper authoritative
response, one which has AA set in a protocol-meaningful way.
This can be edited to be clearer, perhaps as simply as changing
"authoritative server unavailability" to "authoritative answer
unavailability". We'd be happy to consider alternative text.
Ok, then that needs to be clarified in the draft. And you should discuss
exactly which kind of failures are valid for extending the TTL and which
are not and which should still try another auth server.
ServFail is a clear signal that something is going wrong with the
authoritative server itself has something going wrong. If you send a
ServFail then AA is completely irrelevant.
REFUSED is slightly murkier as to its exact meaning, thanks to
overloading, but in its most commonly seen usage for lameness
indicates a clear problem with the delegation. Even in its other use
cases, notably an EDNS Client Subnet error or an actual "I am
authoritative for the name but administratively denying your
resolution of it", I submit that if the resolver has a stale answer
then serving it is reasonable. In that administrative denial case
it'd be better to issue NXDomain anyway, which is exactly what split
horizon authorities do.
Other lesser seen rcodes are largely similar in not indicating
anything at all about the legitimacy of the name and whatever data you
might have previously associated with it. Only the dynamic update
rcodes come close to being relevant, but they are not part of the
resolution process covered by serve-stale.
Despite the unfortunate RFC 1035 nomenclature of NXDomain as "Name
Error" it is called out explicitly because it isn't really an error,
not in the database lookup sense. There's no way of knowing whether
the NXDomain is happening because of operator fault or the far more
likely case that it just doesn't exist. That's why it is called out
separately in the doc, with an explicit note about why it has to be
treated as replacing any stale data associated with the name.
So put some text similar to this in the draft.
Paul
_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop