[DNSOP] Fundamental ANAME problems

Brian Dickson Thu, 01 Nov 2018 15:35:07 -0700

Greetings, DNSOP folks.

First, a disclaimer and perspective statement:
These opinions are mine alone, and do not represent any official position
of my employer.
However, I will note that it is important to have the perspective of one
segment of the DNS ecosystem, that of the authority operators (are also
known as DNS hosting operators.)
IMNSHO, authority operators provide a critical element of the DNS
ecosystem, operating DNS zones for the vast majority of DNS registrants.


The important element of this perspective is, that changes to how DNS
operates and scales, if they have an adverse impact on authority operators
(at scale), have potential knock-on impact to everyone.
If (and I realize this is a big "if") changes are made which adversely
affect the operation cost (regardless of whether it is direct or
indirect/consequential) of operating authority services, this puts at risk
the ability of registrants to operate their own zones (e.g. if there are
fewer authority operators, or if prices skyrocket).
This further puts at risk, the ongoing volume of DNS registrations, which
impacts the viability of everyone else whose business relies on
registration fees, directly or indirectly, including TLDs, CDNs, (non-DNS)
hosting, and even ICANN itself.
Caveat dnsops.

Given the above, I feel it is important to point out several problems that
are rooted in the requirement to dynamically update the sibling records
that is present in the current design of ANAME. (This is the only real
problem I see, but it's a doozie.)

(The introduction text mentions some of these, but IMHO doesn't adequately
address their impact.)

First, there is the issue of imposed update frequency.

The requirement on update rate, is imposed externally by whichever entity
operates the ANAME target. In other words, this is not under the direct
control of the zone operator, and is potentially a potentially (and very
likely) UNBOUNDED operational impact/cost.

Second, this issue is compounded by scale.

The issue here is, that the larger the entity is that operates zones with
ANAMEs is, the larger the resulting impact. This is a new, unanticipated,
asymmetric cost. It has the definite potential to make operating authority
servers prohibitively costly.

Third, there is an issue with the impact to anycast operation of zones with
ANAMEs, with respect to differentiated answers, based on topological
locations of anycast instances.

There is currently an expectation on resolving a given name, that where the
name is ultimately served (at the end of a *NAME chain) by an entity doing
"stupid DNS tricks" (e.g. CDNs), that the answer provided is topologically
appropriate, i.e. gives the "best" answer based on resolver (or in the case
of client-subnet, client) location.
When done using CNAMEs, the resolver is the entity following the chain, and
does so in a topologically consistent manner. Each resolver instance
querying a sequence of anycast authorities which return respective CNAMEs,
gets its unique, topologically-appropriate answers, and there is no
requirement or expectation that resolvers in topologically distinct
locations have any mutual consistency.
ANAME places the authority servers in an anycast cloud, in a "Hobsons
choice" scenario. Either a single, globally identical sibling value is
replicated to the anycast instances (which violates the expectation of
resolvers regarding "best" answer), or each anycast instance needs to do
its own sibling maintenance (with all that implies, including on-the-fly
DNSSEC signing), or the anycast cloud now has to maintain its own set of
divergent, signed answers at the master, and add all the complexity of
distributing and answering based on resolver topological placement. (The
last two have significant risk and operational complexity, multiplied by
the volume of zones served, and impacted by the size of the anycast cloud.)

To summarize:
The requirement to maintain sibling records (A/AAAA) itself is absolutely a
"camel back breaking" requirement. The issues are: frequency of updates
required is externally imposed; either the correctness required by ANAME
targets is broken (using single A/AAAA value regardless of anycast
location), or the complexity of performing A/AAAA updates is compounded by
at least NxM (N anycast locations of authority operatior, M disparate
values provided in response to A/AAAA queries to the ANAME target); plus
the added requirement of on-the-fly DNSSEC signing is a non-scalable and
security-challenging non-starter.

Side-note: we, as a community, have been pushing for wide-scale adoption of
DNSSEC; this definitely places a significant hurdle to adoption, precisely
in a wide-scale manner, i.e. to the vast majority of DNS registrants. It is
a big roadblock to DNSSEC adoption, and a move in the wrong direction.

What are the alternatives?

Fundamentally, the behavior that is desired that we are collectively trying
to preserve, is that of resolver-based *NAME chain resolution, just with
the ability to do so at the apex of a zone.

This points to the only logical places that MUST be part of any apex-based
chaining of resolution: resolvers, or clients.
(I include clients as an option, since it is at least feasible to include
client-based multiple-lookups as an alternative to "additional processing"
that would otherwise be needed in resolvers.)

Ultimately, this means any solution that has this characteristic, can only
provide backwards compatibility to clients, if resolvers are updated, or
alternatively, if clients are updated to do whatever is required that
resolvers which aren't updated won't do. (Sorry for the badly written
logic, english is not really well suited for branching logic.)

There would still be a requirement on authority servers, but it would not
provide a transparent backwards compatibility without updates to resolvers
or clients. There would need to be SOME response to give to resolvers,
telling them where to go and what to do. The only differences are, on the
non-providing of sibling records, and the changes to response processing in
which the new record type(s) are selected and returned. On the plus side,
in this different model, the RRs would be basically constant, allowing the
scaling of authority operators to be unaffected, and there would not be a
requirement for dynamic updates or frequent/on-the-fly DNSSEC signing.

There are several choices, and dependent on what those are, is what the
logic change needed would be, where the new RRtype(s) could be used, and
how resolvers and/or clients would handle the new responses.

If the current limited ANAME (without sibling) is used, the logic would be
to return it only if the corresponding query types (A or AAAA) are seen,
there is no RRset of that type at the owner name, and there is an ANAME
record. The processing on the authority, and resolver or client, would be
pretty much as in the current draft, modulo not having the sibling record.
If the response to an A/AAAA query is an ANAME, follow the chain until an
A/AAAA record is found, or the chain breaks. If this is done by the
resolver, special handling on TTL and new client queries would be necessary.

Another option would be a similar type, maybe ACNAME (address CNAME), which
could co-exist with other types, and could occur both at an apex and
anywhere else in a zone. Same basic processing logic.

A third option would be, for lack of a better term, WCRR, wildcard RR type,
whose RDATA is an FQDN. The logical processing would be to add handling at
the point where "NOERROR, NODATA" would otherwise be returned, right above
the WILDCARD in the resolution process. If no match for QTYPE is found,
check for WCRR type, and return that it found. Resolver handling would be
the same; if a WCRR type is returned, replace the QNAME with the RDATA and
re-start query processing.

Client handling of these could be done via simultaneous queries for A (or
AAAA) and ANAME (or WCRR), and following ANAME/WCRR replies to their target
if no A/AAAA response is received and an ANAME/WCRR is received.

(IMHO, the WCRR is a bit cleaner to implement on both authority and
resolver, and is more forward-looking, i.e. handles other aliasing at other
RRtypes as well, if needed.)

The last option would be handling all of this junk in the browser, with
either SRV, or whatever new RRTYPE is required that fixes the problem(s)
with SRV that the HTTP folks require.

Brian

_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

[DNSOP] Fundamental ANAME problems

Reply via email to