Hi Paul,

(with apologies for breakfast/iPad MIME crime that surely follows)

> On Feb 8, 2018, at 01:02, Paul Wouters <p...@nohats.ca> wrote:
>> On Wed, 7 Feb 2018, Robert Story wrote:
>>> On Wed 2018-02-07 10:43:16-0500 Paul wrote:
>>> How about using this query to also encode an
>>> uptime-processstartedtime value? Maybe with accurancy reduced to
>>> minutes. I think that would return valuable data.
>> -1 for feature creep and the technical reasons Joe mentioned.
> We have a giant hole in our understanding of why there are updated
> nameservers running the latest software with the older keys. We
> need to gain understanding and we know we need more data.

I don't disagree with the need for more data, but I think the hole you mention 
is not so giant. As far as I can tell it's a result of:

1. RFC5011 support not being turned on in nameservers that have been upgraded 
but whose older, DNSSEC-validating configuration has been preserved across 
updates (most cases), and

2. RFC5011 support exercising a code path that requires a writable, persistent 
filesystem to store an updated trust anchor, which turns out not to be 
available (fewer, but some cases).

These are both BIND9 problems, which I mean as a complement since they are 
indicative of (a) widespread use, (b) early implementation of DNSSEC and (c) a 
high degree of backwards compatibility in configuration.

The larger question of whether RFC5011 is a practical or sufficient mechanism 
given this experience is a reasonable one. You may recall I have been a serial 
advocate for adding standardised bootstrap mechanisms that include fetching a 
trust anchor out-of-band, for example, which I still think would be a practical 
remedy even if a slightly inelegant one; unbound-anchor and its use in package 
and system start scripts is, I think, a key reason why the two problems 
described above don't show up in unbound.

My sense from the recent KSK rollover/RFC8145 data collection experience is 
that the actual impact on end-users from validators dependent on the outgoing 
KSK is very small. This is hard to quantify with precision, however, because we 
are not able to measure the state of most resolvers (e.g. those not reporting 
via RFC 8145 or not validating), nor assess their operational impact (e.g. size 
of end-user population and impact of validation failures upon them) with any 
degree of accuracy.

I think that the sentinel approach of measuring end-user impact from the 
end-user perspective gets us much closer to useful data in general. However, 
it's not clear to me how even a trusted, accurate sense of uptime across all 
resolvers would help with those questions.

DNSOP mailing list

Reply via email to