Re: [DNSOP] draft-wkumari-dnsop-dist-root-01.txt

Mark Andrews Sun, 06 Jul 2014 17:41:18 -0700

In message <[email protected]>, Joe Abley writes
:
> Hi Paul, Warren,
> 
> On 4 July 2014 at 16:50:08, Paul Hoffman ([email protected]) wrote:
> 
> > Greetings. Warren and I have done a major revision on this draft, 
> narrowing the design  
> > goals, and presenting more concrete proposals for how the mechanism 
> would work. We welcome  
> > more feedback, and hope to discuss it in the WG in Toronto.
> 
> I think there is much in the language of this draft that could be 
> tightened up, but this is an idea for discussion so I'll avoid a pedantic 
> line-by-line dissection. But I can give you the full pedantry if you like 
> :-)
> 
> On the pros and cons, however (crudely pasted below), see below.
> 
> TL;DR: there are way more cons than pros to this proposal. The pros 
> listed are weak; the cons listed are serious. I don't see a net advantage 
> to the DNS (or to perceived performance of the DNS for any client) here. 
> This proposal, if implemented, would represent non-trivial additional 
> complexity with minimal or no benefit. I am not in favour of it, if 
> that's not obvious.
> 
> As noted previously, I am not against documenting and discussing the 
> merits of slaving the root zone on resolvers (in some fashion). My 
> preference would be for a draft called something like 
> "draft-ietf-dnsop-slaving-root-on-resolvers-harmful", which could borrow 
> much of your section 5.1 and 5.2 to make its argument.
> 
> I remain very much *not* in favour of making changes to the DNS 
> specification that don't have a clear benefit to balance their costs.
> 
> ---
> 
> 5.1. Pros
> 
>  o Junk queries / negative caching - Currently, a significant number
>    of queries to the root servers are "junk" queries. Many of these
>    queries are TLDs that do not (and may never) exist in the root
>    Another significant source of junk is queries where the negative
>    TLD answer did not get cached because the queries are for second-
>    level domains (a negative cache entry for "foo.example" will not
>    cover a subsequent query for "bar.example").
> 
> I think a better way to accommodate the second point is to implement 
> qname minimisation in recursive server logic.


When you can get rid of all the servers in the world which followed
RFC 2535 which return NXDOMAIN for empty non terminal qname
minimisation and this sort of logic will be viable though it won't
do anywhere as near as good a job as having a local copy of the
root zone.

> I don't know that the first point is much of a pro. Root server operators 
> need to provision significant spare capacity in order to accommodate 
> flash crowds and attack traffic, and compared to that spare capacity the 
> volume of junk queries is extremely small. There's no obvious operational 
> benefit to root server operators in reducing their steady-state query 
> load (in fact, it would make it harder in some cases to obtain the 
> exchange point capacity you need to accommodate flash crowds, on 
> exchanges where higher-capacity ports are reserved for those that have 
> demonstrable need based on steady-state traffic.)

But there is big benefit to cache operators.  The bigger the client base
the bigger the benefit.
 
> I'm also a little concerned about the word "junk". It's a pejorative term 
> that implies assumptions about the intent of the original query. If my 
> intent is to confirm that a top-level label doesn't exist, then 
> "BLAH/IN/SOA" is a perfectly reasonable query for me to send to a root 
> server. We might assume that a query "Joe's iPhone/IN/SOA" sent to a root 
> server is not reasonable, but we're only assuming; we don't actually have 
> a way of gauging the actual intent with any accuracy.
> 
>  o DoS against the root service - By distributing the contents of the
>    root to many recursive resolvers, the DoS protection for customers
>    of the root servers is significantly increased. A DDoS may still
>    be able to take down some recursive servers, but there is much
>    more root service infrastructure to attack in order to be
>    effective. Of course, there is still a zone distribution system
>    that could be attacked (but it would need to be kept down for a
>    much longer time to cause significant damage, and so far the root
>    has stood up just fine to DDoS.
> 
> If I was to paraphrase this advantage with malicious intent :-), you mean 
> that "we don't have to rely upon the root server system to continue to 
> perform under attack, because we don't need the root server system any 
> more, although we do need the new bits of the root server system we are 
> specifying, and if those bits are not available we do need the 
> conventional root server system after all, but that's probably ok because 
> the root server system is pretty resilient". That sounds a bit circular.
> 
>  o Small increase to privacy of requests - This also removes a place
>    where attackers could collect information. Although query name
>    minimization also achieves some of this, it does still leak the
>    TLDs that people behind a resolver are querying for, which may in
>    itself be a concern (for example someone in a homophobic country
>    who is querying for a name in .gay).
> 
> There's an implication here that a recursive resolver sending a query to 
> a root server is potentially impinging upon the privacy of its anonymous 
> clients. I find that a bit difficult to swallow.

Given the intelligence that root server operators have glenned in the past
there is a degree of credability here.

> I'm surprised not to see "improves performance for clients" in this list, 
> on the general principle that every cache miss that triggers a query to a 
> root server will take longer than consulting a pre-fetched root zone. I'm 
> glad about that, though, since I think that performance improvement is 
> (a) minuscule in normal operation, affecting 1/BIGNUM clients who expose 
> a cache miss and (b) also achievable in the steady state by resolvers 
> that perform cache pre-fetching (e.g. hammer-like behaviour).
> 
> My overall summary for 5.1 is that there's no clear benefit in 
> performance, reliability or stability from making this change.
> 
> 5.2. Cons
> 
>  o Loss of agility in making root zone changes - Currently, if there
>    is an error in the root zone (or someone needs to make an
>    emergency change), a new root zone can be created, and the root
>    server operators can be notified and start serving the new zone
>    quickly. Of course, this does not invalidate the bad information
>    in (long TTL) cached answers. Notifying every recursive resolver
>    is not feasible. Currently, an "oops" in the root zone will be
>    cached for the TTL of the record by some percentage of servers.
>    Using the technique described above, the information may be cached
>    (by the same percentage of servers) for the refresh time + the TTL
>    of the record
> 
> A new root zone is published usually two (but sometimes more) times per 
> day. The semantics specified in the draft for refreshing a local copy of 
> the root zone say "keep re-using the copy you have until it expires". If 
> I assume that "expire" means "survives beyond SOA.EXPIRE seconds of when 
> we originally fetched it", then there's the potential for stale data to 
> be published for a week plus however old the originally-retrieved file 
> was (which is difficult to determine, in contrast to the traditional root 
> zone distribution scheme). I think this disadvantage is more serious than 
> is presented.

Slaves perform refresh queries every 30 minutes (refresh = 1800).
Oops actually clear up faster with slaves than without as many of
the responses are now direct to stub rather than cached responses
which have much higher TTLs.

If one was really worried one could keep a log of the last 24 hours
of zone tranfers and issue a NOTIFY to all of the sources that
transfered the zone.  Normal refresh logic would then kick in for
a large percentage of slaves.  This is permitted by the RFC's.
Machines are actually good at doing this sort of thing.

This is actually a pro not a con.

>  o No central monitoring point - DNS operators lose the ability to
>    monitor the root system. While there is work underway to
>    implement better instrumentation of the root server system, this
>    (potentially) removes the thing to monitor.
> 
> In fact there's exactly the same ability to monitor the root server 
> system; it's just that the data available through such monitoring will be 
> different (as you point out in the second sentence). OK, this one is a 
> bit pedantic.
> 
>  o Increased complexity in nameserver software and their operations -
>    Any proposal for recursive servers to copy and serve the root
>    inherently means more code to write and execute. Note that many
>    recursive resolvers are on inexpensive home routers that are
>    rarely (if ever) updated.
> 
> You don't require universal deployment for this scheme to work, so the 
> long tail of DNS software upgrades is arguably not great concern. I think 
> the increased complexity in operations is significant, though. My 
> observation is that people already have enough difficulty troubleshooting 
> DNS problems, and adding in a brand new set of services for root zone 
> distribution that also need to be considered (potentially a set of 
> services that are not subject to the same internal and external scrutiny 
> of root server operations, and which are potentially operated by people 
> that are even less familiar and easy to reach than root server operators 
> are) is only going to make things worse.

In the 10+ years I have been slaving the root zone I have not once needed
to trouble shoot it.  Trouble shooting a slaving the root is no different
to trouble shooting any other slave zone.

I call this one FUD.
 
> If there was a significant benefit to performance or reliability to 
> balance that increased operational complexity, I might see a reason to do 
> it. I don't see that benefit, though.
> 
>  o Changes the nature and distribution of traffic hitting the root
>    servers - If all the "good" recursive resolvers deploy root
>    copying, then root servers end up servicing only "bad" recursive
>    resolvers and attack traffic. The roots (could) become what AS112
>    is for RFC1918.
> 
> The difference is that queries directed at AS112 servers are definitively 
> junk. They are requests for names that cannot exist with global 
> uniqueness, since they correspond to infratructure that is not globally 
> unique. By contrast, the root servers will always receive queries that 
> are important to answer (from an end-user's perspective), even if the 
> proportion of such queries declines following the unexpected and 
> optimistic changes in resolver behaviour implied in that paragraph.
> 
> Architecturally, I don't think you improve the quality of a service by 
> reducing the impact of failure and giving the operators busy-work to fill 
> their time.

Root servers end up servicing mostly SOA/IXFR queries from updated
recursive servers.  46 of these a day which say "up-to-date" along
with 2 "IXFR/AXFR" responses which actually transfer zone content
which fall into a 30 minute window after each zone update. 

They also deal with all the legacy recursive server traffic.  This
traffic as exactly the same ratio of "junk" queries as today's
traffic does.

Changing the traffic patterns at the root is neutral.  It is neither
good nor bad.

> Joe
> 
> 
> _______________________________________________
> DNSOP mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/dnsop

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: [email protected]

_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] draft-wkumari-dnsop-dist-root-01.txt

Reply via email to