Yesturday I published a test on root server response times and failure
rates which seems to have caused a bit of an uproar with root server
operators and much discussion.  As is my usual habit - I published the
results without much explanation and I appologize for all the concern it
has caused.

To that end I am providing a detailed explanation of the results.  First
of all the original message which is indexed at:

http://dns.pccf.net/root-servers/test.2000.04.26/root-server-statistics-msg.txt

had the following statement in it;

> Here are the results of the first test.  Please note - IDNS root server
> b.i-dns.net. (203.37.255.102) is down due to a subway fire in
> Washington.  We have been notified by the admin that it will be up
> A.S.A.P.  ORSC pine.higgs.net. (204.80.125.130) and spruce.higgs.net. 
> (204.80.125.145) are down for maintenance.

Everything above is true

> The best results and response time came from USG and TINC roots.

However this is false.  I did not write this statement - it was a stats
student who assisted me in compiling the results and gave his observations
on the data.

In order to understand the results one must understand the answers and the
nature of the response.  As has been discussed on various lists, there are
two types of root servers in operation world wide.  Some servers only
provide pointers where additional answers to a query can be
found.  Usually these are gtld or cctld zones.  Some servers give
recursive or complete answers to the question.  These roots query the
cctld or gtld zones and get pointers to the zones dns, and then go one
step further and query the zone for an authoritative answer.  This results
in substantial differences in response time between recursive and
non-recursive roots.  This also results in increased failure if the zone
is bogus or if any of the servers queried during the recursive process
fail.

A detailed discussion of this is available in a communication I had with
Mr. Sexton of the ORSC and is indexed at:

http://dns.pccf.net/root-servers/test.2000.04.26/orsc-response-jlb.txt

I have recompiled the statistics so they now display which server does
recursive queries and which servers provide pointers.  The new data is
indexed at:

http://dns.pccf.net/root-servers/test.2000.04.26/server-stats.txt.gz

For those die hard dns fanatics who love to look at data, the questions
file is located at:

http://dns.pccf.net/root-servers/test.2000.04.26/dns-questions.gz

The test data is indexed at: 

http://dns.pccf.net/root-servers/test.2000.04.26/dump-test-3-file-temp.gz

.. and you'll need the root server map which contains the ROOT SERVER CODE
NAME - which xreferences the test data to the servers host name and ipv4
address - that file is indexed at:

http://dns.pccf.net/root-servers/test.2000.04.26/root-servers-map.gz

Now 322 queries were made from a population of 144,336 potential
questions.  Statistically speaking this results in a standard error of +/-
5% to 6%, 50% of the time with a confidence of 95%.  If any stats people
are out there they'll understand.  However stats and the internet go
together like oil and vinagar so I would not put my eggs in that basket.

If you look up ns1.vrx.net on the stats page:

http://dns.pccf.net/root-servers/test.2000.04.26/server-stats.txt.gz

You will notice that ns1.vrx.net has two ipv4 numbers associated with it,
199.166.24.1 and 204.138.71.254.  The average response time for
199.166.24.1 is 992 ms and 364 ms for 204.138.71.254.  This represents a
difference of 628 ms.  However this is the same server.  So based on this
little fact alone I don't trust this data.  I think the population sample
is too small to obtain correct averages.  

Also, you will notice that the minimum response time for all servers is 2
ms on ns1.diebold.net. (205.189.73.10).  That's fast right?  No it is not,
in fact ns1.diebold.net is next door to pccf - so a 2 ms response time is
understandable.

Again let me stress that this was a first test, designed more for guidance
then anything else.

The critical numbers here are the average response rate per server and if
the server provides full answers (recursive) or partial answers
(pointers).  In the case of second level domain queries the ORSC servers
had a minimum average response time of 364 ms.  The mimimum response time
for USG roots was 272 ms.  The ORSC server does recursion and has provided
a complete answer in 364 ms, which the USG root has only provided a
pointer to the gtld or cctld zone.  In this case the ORSC root is faster.

The same applies if we compare maximum averages.  In the ORSC the max was
4717 ms and on the USG it was 1713 ms.  In this case the lag time is about
the same.

I'm now making an educated guess here - but it would seem to me that
recursive servers have a response time which is a mutiple of 3 times the
response time from servers which only provide pointers.  People using
servers which only provide pointers via local dns will most likely
experience a greater degree or potential for lag time as dns queries are
propagated through the dns tree.  If a recursive server was pointed to via
a hints cache file on the local dns I expect the same lag time would
apply.  However, until I have results from a month or more of testing I'm
not sure I want to put all my eggs in that basket either.

A number of people have asked me how to improve root response rate.  yes
it is possible.

The best way for ISP's to provide their clients with fast dns responses is
to take all root servers out of the loop and become your own root.  By
transferring the zone from any root server authority and having it on your
own machine will remove the root servers completely.  This can be done by
secondarying the "." root in the same way as one would with any zone or
domain name.  When the root is updated, your root will automatically
update itself.

Regards
Joe Baptista


Reply via email to