Re: [dnsop] respsize-04 has landed in the repo

Joe Abley Tue, 01 Aug 2006 13:17:37 -0700


On 1-Aug-2006, at 02:30, Paul Vixie wrote:

i'll say again, this deserves a re-read, since it's almost a re-write vs. -03.


Comments interspersed with text, below.

Apologies in advance for being an amateur grammarian. People withreal linguistic training should feel free to smack me down withprejudice.

Is it the intention that this draft ultimately be published in theBCP series?

                                    Abstract
With a mandated default minimum maximum message size of 512octets,the DNS protocol presents some special problems for zoneswishing toexpose a moderate or high number of authority servers (NSRRs). Thisdocument explains the operational issues caused by, orrelated to
      this response size limit.

... and gives guidance to zone administrators and implementers of DNSsoftware? (The former with respect to choosing an appropriate NS setfor a zone; the latter with respect to the additional sectionordering discussed later on.)

   1 - Introduction and Overview
1.1. The DNS standard (see [RFC1035 4.2.1]) limits message sizeto 512octets. Even though this limitation was due to the requiredminimum IPreassembly limit for IPv4, it became a hard DNS protocol limitand isnot implicitly relaxed by changes in transport, for example toIPv6.
   1.2. The EDNS0 protocol extension (see [RFC2671 2.3, 4.5]) permits
larger responses by mutual agreement of the requestor andresponder.

I say "requester", possibly because I am not really Canadian. I'llcontinue to say that throughout the document, on the off-chance thatit's worth saying.

However, deployment of EDNS0 cannot be expected to reach everyInternetresolver in the short or medium term. The 512 octet messagesize limit
   remains in practical effect at this time.

I find the use of "short or medium term" and "at this time" vexing ina document that will exist in static form for a long time in the future.


Perhaps this could be rephrased as

"The 512 octet message size limit will remain in practical effectuntil there is widespread deployment of EDNS0 in DNS resolvers on theInternet. At the time of publication this is not expected to happenin the short or medium term."

   1.3. Since DNS responses include a copy of the request, the space

available for response data is somewhat less than the full 512octets.

   Negative responses are quite small, but for positive and delegation

responses, every octet must be carefully and sparinglyallocated. This

   document specifically addresses delegation response sizes.

   2 - Delegation Details

   2.1. A delegation response will include the following elements:

      Header Section: fixed length (12 octets)
      Question Section: original query (name, class, type)
      Answer Section: (empty)
      Authority Section: NS RRset (nameserver names)
      Additional Section: A and AAAA RRsets (nameserver addresses)

2.2. If the total response size would exceed 512 octets, and ifthe data

   that would not fit was "required", then the TC bit will be set

(indicating truncation). This will usually cause the requestorto retry


"requester".

You're mixing moods in the first sentence, I think -- if you replace"would exceed" and "would not fit" with "exceeds" and "does not fit",then the indicative-mood "will be" will be able to rest morecomfortably in the sentence.

   using TCP, depending on what information was desired and what
   information was omitted.  (For example, truncation in the authority
section is of no interest to a stub resolver who only plans toconsumethe answer section.) If a retry using TCP is needed, the totalcost ofthe transaction is much higher. See [RFC1123 6.1.3.2] fordetails on
   the requirement that UDP be attempted before falling back to TCP.

I think if you lose the brackets around the example it will make thetext clearer.

   2.3. RRsets are never sent partially unless TC bit set to indicate
truncation. When TC bit is set, the final apparent RRset in thefinalnonempty section must be considered "possibly damaged" (see[RFC1035


"non-empty"

   6.2], [RFC2181 9]).
2.4. With or without truncation, the glue present in theadditional data
   section should be considered "possibly incomplete", and requestors


"requesters"

should be prepared to re-query for any damaged or missingRRsets. Notethat truncation of the additional data section might not besignalled
   via the TC bit since additional data is often optional.
2.5. DNS label compression allows a domain name to beinstantiated onlyonce per DNS message, and then referenced with a two-octet"pointer"from other locations in that same DNS message. If allnameserver names
   in a message are similar (for example, all ending in ".ROOT-
SERVERS.NET"), then more space will be available foruncompressable data


"incompressible"

   (such as nameserver addresses).

Since the rest of the text is so well-annotated with references, areference to [RFC 1035 4.1.4] seems like it belongs somewhere here.

2.6. The query name can be as long as 255 characters ofpresentationdata, which can be up to 256 octets of network data. In thisworst casescenario, the question section will be 260 octets in size, whichwouldleave only 240 octets for the authority and additional sections(after
   deducting 12 octets for the fixed length header.)

So, maximum name size is 255 [RFC 1035 2.3.4]; maximum QNAME size ismaximum name size plus 1 for the zero-length root label [RFC 10354.1.2]; QTYPE is fixed two octets; QCLASS is fixed two octets. 255 +1 + 2 + 2 = 260. Or something like that.

Something in me wants to see more working in the text above (perhapsbroken into a table, or some other easy-to-appreciate layout).

2.7. Average and maximum question section sizes can be predictedby the
   zone owner, since they will know what names actually exist, and can
measure which ones are queried for most often. For cost andperformancereasons, the majority of requests should be satisfied withouttruncation
   or TCP retry.


Is this true?

If the name AFILIAS.INFO exists, then the largest request to an INFOserver that can give a delegation response still involves a 260-bytequestion section (since I can pad my question to the left ofAFILIAS.INFO with big labels).

I don't see how knowledge of the contents of the INFO zone couldallow me to think that the maximum question section size could beless than 260.

The average section size (for some measure of average) seems likeuseful consideration, however, since "average" plays nicely with"majority".

2.8. Some queries to non-existing names can be large, but thisis not a

   problem because negative responses need not contain any answer,
   authority or additional records.  (See [RFC2308 2.1] for more
   information about the format of negative responses.)


I think you can safely lose the round brackets.

2.9. The minimum useful number of name servers is two, forredundancy
   (see [RFC1034 4.1]).  In case of multihomed name servers, it is

"multi-homed" (bleh, I am wildly inconsistent with this, so I have noreal business typing anything here.)

   advantageous to include an address record from each of several name
   servers before including several address records for any one name
server. If address records for more than one transport (forexample, Aand AAAA) are available, then it is advantageous to includerecords of
   both types early on, before the message is full.

   2.10. The best case is no truncation at all.  This is because many
requestors will retry using TCP by reflex, or will automaticallyre-


"requesters"

   query for RRsets that are "possibly truncated", without considering
   whether the omitted data was actually necessary.


I think you can lose the quotes around "possibly truncated".

2.11. Each added NS RR for a zone will add a minimum of between16 and44 octets to every untruncated referral or negative responsefrom the

"non-truncated". "Minimum of between X and Y" seems like an oddphrase (isn't the minimum of between X and Y just X?) but the rest ofthe sentence provides context.

zone's authority servers (16 octets for an NS RR, 16 octets foran A RR,and 28 octets for an AAAA RR), in addition to whatever space istaken by
   the nameserver name (NS NSDNAME as well as A or AAAA owner name).
2.12. While DNS distinguishes between necessary and optionalresourcerecords, this distinction is according to protocol elementsnecessary to
   signify facts, and takes no official notice of protocol content
necessary to ensure correct operation. For example, anameserver name
   that is in or below the zone cut being described by a delegation is
"necessary content," since there is no way to reach that zoneunless theparent zone's delegation includes "glue records" describing thatname
   server's addresses.
2.13. It is also necessary to distinguish between "explicittruncation"where a message could not contain enough records to convey itsintendedmeaning, and so the TC bit has been set, and "silenttruncation", wherethe message was not large enough to contain some records whichwere "not
   required", and so the TC bit was not set.
2.14. An delegation response should prioritize glue records asfollows.

So, this part here seems like guidance to implementers. Perhaps itwould be worth isolating the guidance to implementers and that tozone administrators, and to label them accordingly?

A zone administrator reading the following text might be confused asto how they configure their nameserver to sort the additionalsection, for example. If it was obvious that this was notconfiguration work for them but instead behaviour they should expectand request from their nameserver vendor, that might be useful.

   first
All glue RRsets for one name server whose name is in or belowthezone being delegated, or which has multiple address RRsets(currently
      A and AAAA), or preferrably both;

"preferably". The EPP specifications use the word "subordinate" tomean "in or below the zone being delegated" (and "superordinate" toindicate the converse). Those might be useful in the interests ofavoiding repetition of that long phrase.

   second
Alternate between adding all glue RRsets for any name serverswhosenames are in or below the zone being delegated, and all glueRRsetsfor any name servers who have multiple address RRsets(currently A
      and AAAA);

   thence
      All other glue RRsets, in any order.
The goal of this priority scheme is to offer "necessary" gluefirst,
   avoiding silent truncation for this glue if possible.

What about re-ordering RRsets within each of those categories betweensuccessive queries?

Also, this advice seems to indicate that if I have nameserversanswering on the addresses:


  199.212.90.4
  204.152.186.101
  204.152.186.102
  2001:4f8:3:ba:202:b3ff:fe8a:608
  2001:4f8:3:ba:202:b3ff:fe8a:605

then it would be better to do something like:

  ns1 (199.212.90.4)
  ns2 (204.152.186.101, 2001:4f8:3:ba:202:b3ff:fe8a:608)
  ns3 (204.152.186.102, 2001:4f8:3:ba:202:b3ff:fe8a:605)

than to separate the v6 addresses out onto separate names, viz:

  ns1 (199.212.90.4)
  ns2 (204.152.186.101)
  ns3 (204.152.186.102)
  ns4 (2001:4f8:3:ba:202:b3ff:fe8a:608)
  ns5 (2001:4f8:3:ba:202:b3ff:fe8a:605)

since there is less risk of an additional section in a delegationresponse only including glue for one transport if I take the firstpath rather than the second.

If that's to be the way of things, you might mention this specificdesign decision (and indicate why one is preferred over the other).

If in the future there is a third address family that people use tocarry DNS traffic over, how would you place a nameserver that had RRscorresponding to all three transports, compared with one that had RRscorresponding to just one or two? Is this worth mentioning?

   2.15. If any "necessary content" is silently truncated, then it is

advisable that the TC bit be set in order to force a TCP retry,ratherthan have the zone be unreachable. Note that a parent server'sproperresponse to a query for in-child glue or below-child glue is areferralrather than an answer, and that this referral MUST be able tocontainthe in-child or below-child glue, and that in outlying cases,only EDNS

   or TCP will be large enough to contain that data.

   3 - Analysis

3.1. An instrumented protocol trace of a best case delegationresponsefollows. Note that 13 servers are named, and 13 addresses aregiven.

   This query was artificially designed to exactly reach the 512 octet
   limit.

      ;; flags: qr rd; QUERY: 1, ANS: 0, AUTH: 13, ADDIT: 13
      ;; QUERY SECTION:
      ;;  [23456789.123456789.123456789.\
           123456789.123456789.123456789.com A IN]        ;; @80

      ;; AUTHORITY SECTION:
      com.                 86400 NS  E.GTLD-SERVERS.NET.  ;; @112
      com.                 86400 NS  F.GTLD-SERVERS.NET.  ;; @128
      com.                 86400 NS  G.GTLD-SERVERS.NET.  ;; @144
      com.                 86400 NS  H.GTLD-SERVERS.NET.  ;; @160
      com.                 86400 NS  I.GTLD-SERVERS.NET.  ;; @176
      com.                 86400 NS  J.GTLD-SERVERS.NET.  ;; @192
      com.                 86400 NS  K.GTLD-SERVERS.NET.  ;; @208
      com.                 86400 NS  L.GTLD-SERVERS.NET.  ;; @224
      com.                 86400 NS  M.GTLD-SERVERS.NET.  ;; @240
      com.                 86400 NS  A.GTLD-SERVERS.NET.  ;; @256
      com.                 86400 NS  B.GTLD-SERVERS.NET.  ;; @272
      com.                 86400 NS  C.GTLD-SERVERS.NET.  ;; @288
      com.                 86400 NS  D.GTLD-SERVERS.NET.  ;; @304


      ;; ADDITIONAL SECTION:
      A.GTLD-SERVERS.NET.  86400 A   192.5.6.30           ;; @320
      B.GTLD-SERVERS.NET.  86400 A   192.33.14.30         ;; @336
      C.GTLD-SERVERS.NET.  86400 A   192.26.92.30         ;; @352
      D.GTLD-SERVERS.NET.  86400 A   192.31.80.30         ;; @368
      E.GTLD-SERVERS.NET.  86400 A   192.12.94.30         ;; @384
      F.GTLD-SERVERS.NET.  86400 A   192.35.51.30         ;; @400
      G.GTLD-SERVERS.NET.  86400 A   192.42.93.30         ;; @416
      H.GTLD-SERVERS.NET.  86400 A   192.54.112.30        ;; @432
      I.GTLD-SERVERS.NET.  86400 A   192.43.172.30        ;; @448
      J.GTLD-SERVERS.NET.  86400 A   192.48.79.30         ;; @464
      K.GTLD-SERVERS.NET.  86400 A   192.52.178.30        ;; @480
      L.GTLD-SERVERS.NET.  86400 A   192.41.162.30        ;; @496
      M.GTLD-SERVERS.NET.  86400 A   192.55.83.30         ;; @512

      ;; MSG SIZE  sent: 80  rcvd: 512

3.2. For longer query names, the number of address recordssupplied willbe lower. Furthermore, it is only by using a common parent name(whichis GTLD-SERVERS.NET in this example) that all 13 addresses areable to

   fit.

It seems obvious to me, but I think clarity might be served inmentioning label compression again somewhere around here.

  The following output from a response simulator demonstrates these
   properties:

You mention further below what "green", "yellow", etc are supposed tosignfy, but I think the secret decoder ring would fit more usefullybefore the results. The interpretation of those colours as anescalating series towards disaster is possibly not universallyunderstood.

The use of "#" to mean number is also far more common in NorthAmerica than in other places (there are plenty of places where inthis context it might be taken initially to be a precursor to acomment, per sh(1)). So, "4 NS RRs" or "Number of NS: 4" are bothclearer than "# of NS: 4", I think.

      % perl respsize.pl a.dns.br b.dns.br c.dns.br d.dns.br
      a.dns.br requires 10 bytes
      b.dns.br requires 4 bytes
      c.dns.br requires 4 bytes
      d.dns.br requires 4 bytes
      # of NS: 4
      For maximum size query (255 byte):
          only A is considered:        # of A is 4 (green)
          A and AAAA are considered:   # of A+AAAA is 3 (yellow)
preferred-glue A is assumed: # of A is 4, # of AAAA is 3(yellow)
      For average size query (64 byte):
          only A is considered:        # of A is 4 (green)
          A and AAAA are considered:   # of A+AAAA is 4 (green)
preferred-glue A is assumed: # of A is 4, # of AAAA is 4(green)
% perl respsize.pl ns-ext.isc.org ns.psg.com ns.ripe.netns.eu.int
      ns-ext.isc.org requires 16 bytes
      ns.psg.com requires 12 bytes
      ns.ripe.net requires 13 bytes
      ns.eu.int requires 11 bytes
      # of NS: 4
      For maximum size query (255 byte):
          only A is considered:        # of A is 4 (green)
          A and AAAA are considered:   # of A+AAAA is 3 (yellow)
preferred-glue A is assumed: # of A is 4, # of AAAA is 2(yellow)
      For average size query (64 byte):
          only A is considered:        # of A is 4 (green)
          A and AAAA are considered:   # of A+AAAA is 4 (green)
preferred-glue A is assumed: # of A is 4, # of AAAA is 4(green)
   (Note: The response simulator program is shown in Section 5.)

   Here we use the term "green" if all address records could fit, or
"yellow" if two or more could fit, or "orange" if only one couldfit, or"red" if no address record could fit. It's clear that without acommon
   parent for nameserver names, much space would be lost.  For these
examples we use an average/common name size of 15 octets,befitting our
   assumption of GTLD-SERVERS.NET as our common parent name.

   We're assuming an average query name size of 64 since that is the
   typical average maximum size seen in trace data at the time of this
writing. If Internationalized Domain Name (IDN) or any othertechnologywhich results in larger query names be deployed significantly inadvanceof EDNS, then new measurements and new estimates will have to bemade.

You mentioned earlier that zone administrators should measure theiraverage query sizes. Rather than risking appearing to present adefinitive, all-zones average here (which seems like a contradictionwith the earlier guidance), you might use the word "medium" instead.

   4 - Conclusions
4.1. The current practice of giving all nameserver names acommon parent
   (such as GTLD-SERVERS.NET or ROOT-SERVERS.NET) saves space in DNS
responses and allows for more nameservers to be enumerated thanwouldotherwise be possible, since the common parent domain name onlyappears
   once in a DNS message and is referred to via "compression pointers"
   thereafter.
4.2. If all nameserver names for a zone share a common parent,then itis operationally advisable to make all servers for the zone soserved


"so-served"

also be authoritative for the zone of that common parent. Forexample,the root name servers (?.ROOT-SERVERS.NET) can answerauthoritatively
   for the ROOT-SERVERS.NET.


"... for the ROOT-SERVERS.NET zone".

  This is to ensure that the zone's servers
   always have the zone's nameservers' glue available when delegating.

I don't understand this conclusion; perhaps I'm slow, but I don't seehow it follows from the text that precedes it.

If AUTOMAGIC.ORG is delegated to nameservers which are all namedunder HOPCOUNT.CA, you're saying that all those nameservers shouldalso speak authoritatively for HOPCOUNT.CA?

How important is it to be able to enumerate a full set of gluerecords when a resolver has already arrived at a server which has theanswer it was looking for?

   4.3. Thirteen (13) seems to be the effective maximum number of
   nameserver names usable traditional (non-extended) DNS, assuming a
   common parent domain name, and given that response truncation is
   undesirable as an average case, and assuming mostly IPv4-only
   reachability (only A RRs exist, not AAAA RRs).
XXX 4.4. Adding up to five IPv6 nameserver address records (AAAARRs) toa prototypical delegation that currently contains thirteen (13)IPv4nameserver addresses (A RRs) for thirteen (13) nameserver namesunder acommon parent, would not have a significant negative operationalimpact
   on the domain name system.


Extraneous collaborative editing spoor ("XXX").

   5 - Source Code

I agree with Robert that this source code would be better placed inan appendix, since otherwise it obscures the text-filled sectionswhich follow it.

   #!/usr/bin/perl
   #
   # SYNOPSIS
   #    repsize.pl [ -z zone ] fqdn_ns1 fqdn_ns2 ...
   #        if all queries are assumed to have a same zone suffix,
   #     such as "jp" in JP TLD servers, specify it in -z option
   #
   use strict;
   use Getopt::Std;

   my ($sz_msg) = (512);
   my ($sz_header, $sz_ptr, $sz_rr_a, $sz_rr_aaaa) = (12, 2, 16, 28);
   my ($sz_type, $sz_class, $sz_ttl, $sz_rdlen) = (2, 2, 4, 2);
   my (%namedb, $name, $nssect, %opts, $optz);
   my $n_ns = 0;

   getopt('z', %opts);
   if (defined($opts{'z'})) {
       server_name_len($opts{'z'}); # just register it
   }

   foreach $name (@ARGV) {
       my $len;
       $n_ns++;
       $len = server_name_len($name);
       print "$name requires $len bytes\n";
       $nssect += $sz_ptr + $sz_type + $sz_class + $sz_ttl
               +  $sz_rdlen + $len;
   }
   print "# of NS: $n_ns\n";
   arsect(255, $nssect, $n_ns, "maximum");
   arsect(64, $nssect, $n_ns, "average");

   sub server_name_len {
       my ($name) = @_;
       my (@labels, $len, $n, $suffix);

       $name =~ tr/A-Z/a-z/;
       @labels = split(/\./, $name);
       $len = length(join('.', @labels)) + 2;
       for ($n = 0; $#labels >= 0; $n++, shift @labels) {
           $suffix = join('.', @labels);
           return length($name) - length($suffix) + $sz_ptr
               if (defined($namedb{$suffix}));
           $namedb{$suffix} = 1;
       }
       return $len;
   }

   sub arsect {
       my ($sz_query, $nssect, $n_ns, $cond) = @_;
       my ($space, $n_a, $n_a_aaaa, $n_p_aaaa, $ansect);
       $ansect = $sz_query + 1 + $sz_type + $sz_class;
       $space = $sz_msg - $sz_header - $ansect - $nssect;
       $n_a = atmost(int($space / $sz_rr_a), $n_ns);

       $n_a_aaaa = atmost(int($space
                              / ($sz_rr_a + $sz_rr_aaaa)), $n_ns);
       $n_p_aaaa = atmost(int(($space - $sz_rr_a * $n_ns)
                              / $sz_rr_aaaa), $n_ns);
       printf "For %s size query (%d byte):\n", $cond, $sz_query;
       printf "    only A is considered:        ";
       printf "# of A is %d (%s)\n", $n_a, &judge($n_a, $n_ns);
       printf "    A and AAAA are considered:   ";
       printf "# of A+AAAA is %d (%s)\n",
              $n_a_aaaa, &judge($n_a_aaaa, $n_ns);
       printf "    preferred-glue A is assumed: ";
       printf "# of A is %d, # of AAAA is %d (%s)\n",
           $n_a, $n_p_aaaa, &judge($n_p_aaaa, $n_ns);
   }

   sub judge {
       my ($n, $n_ns) = @_;
       return "green" if ($n >= $n_ns);
       return "yellow" if ($n >= 2);
       return "orange" if ($n == 1);
       return "red";
   }

   sub atmost {
       my ($a, $b) = @_;
       return 0 if ($a < 0);
       return $b if ($a > $b);
       return $a;
   }


The perl seems to run and work as advertised.

   6 - Security Considerations
The recommendations contained in this document have no knownsecurity
   implications.

   7 - IANA Considerations

   This document does not call for changes or additions to any IANA
   registry.
8 - Acknowledgement The authors thank Peter Koch and Rob Austeinfor
   their valuable comments and suggestions.


The text seems to have run on to the end of the section title, there.

   9 - Refrenaces


"References"

[RFC1034] Mockapetris, P.V., "Domain names - Concepts andFacilities",
      RFC1034, November 1987.

   [RFC1035] Mockapetris, P.V., "Domain names - Implementation and
      Specification", RFC1035, November 1987.

   [RFC1123] Braden, R., Ed., "Requirements for Internet Hosts -
      Application and Support", RFC1123, October 1989.
[RFC2308] Andrews, M., "Negative Caching of DNS Queries (DNSNCACHE)",
      RFC2308, March 1998.
[RFC2181] Elz, R., Bush, R., "Clarifications to the DNSSpecification",
      RFC2181, July 1997.
[RFC2671] Vixie, P., "Extension Mechanisms for DNS (EDNS0)",RFC2671,
      August 1999.

   10 - Authors' Addresses

   Paul Vixie


"Internet Systems Consortium"?

      950 Charter Street
      Redwood City, CA 94063
      +1 650 423 1301
      [EMAIL PROTECTED]

   Akira Kato
      University of Tokyo, Information Technology Center
      2-11-16 Yayoi Bunkyo
      Tokyo 113-8658, JAPAN
      +81 3 5841 2750
      [EMAIL PROTECTED]



Joe

.
dnsop resources:_____________________________________________________
web user interface: http://darkwing.uoregon.edu/~llynch/dnsop.html
mhonarc archive: http://darkwing.uoregon.edu/~llynch/dnsop/index.html

Re: [dnsop] respsize-04 has landed in the repo

Reply via email to