Hi Andy,

Thanks again for engaging in detail. The text suggestions are also appreciated!

The changes from this iteration are summarized in 
https://github.com/desec-io/draft-ietf-dnsop-ds-automation/commit/d9fed34480c571be3b49830c966dba264078090f.

As before, the rendered diff from the last revision is visible at 
https://author-tools.ietf.org/iddiff?url1=draft-ietf-dnsop-ds-automation-08&url2=https://desec-io.github.io/draft-ietf-dnsop-ds-automation/draft-ietf-dnsop-ds-automation.txt.

For detailed responses, please see below.

On 5/21/26 00:26, Andy Newton wrote:
#### 5-15 Minutes

240        2.  Parent-side entities (such as registries) SHOULD reduce a DS
241            record set's TTL to a value between 5–15 minutes when a new set
242            of records is published, and restore the normal TTL value at a
243            later occasion (but not before the previous DS RRset's TTL has
244            expired).

Why isn't this a MUST? It is a SHOULD for both behaviors (the 5-15 minute TTL)
and the restoration of the TTL. At the very least, it seems the restoration of
the "normal" TTL value is a MUST. However, why isn't the 5-15 minute value a
MUST? Otherwise an operator can set it to 5 years justified with "because I
just wanted it that way". If the SHOULD is to remain, isn't there a reasonable
upper boundary behavior such as not being greater than the normal TTL value?

Two thoughts:

- TTLs accepted by resolvers are bounded in practice. Most do not accept 
arbitrarily long TTLs, as that has other security issues such as retaining a 
hijacked delegation beyond its actual removal. I can collect numbers if needed, 
but my recollection is that TTLs over 1-2d are not honored in practice.

- The above resolver-side constraint aside, parents can set whatever TTLs they 
want, even without DS automation. DS automation does not change that. The point 
of this provision in the document is to mitigate the impact of a botched DS 
update, not to give general recommendations about what TTLs should be used in a 
TLD registry. (In practice, TTLs in TLD zones vary between 1h and 48h.)

- It's not a MUST because there is no problem if the registry uses a TTL of 20 
or 30 minutes permanently, for example. Also, when DS automation is performed 
by the registrar, the registrar may not have the ability to enforce TTL 
adjustment. -- Now one could say that the document should take the stance that 
DS automation is to be done by registries and not registrars. However, getting 
into that discussion was very very clearly rejected by many participants: the 
registries and registrars want to sort out these responsibilities between 
themselves. And indeed, it's a business decision, not primarily a technical one.

If there are no interoperability issues with the registry setting 30 minutes as 
the TTL value, which is outside the 5 to 15 minute suggestion, then this is not 
a normative requirement and the SHOULD is to be lower-cased and/or changed to 
use non-BCP14 language.

Taking the DISCUSS seriously, please excuse me for mounting a discussion ;-) I 
will lay out my understanding but eventually defer as I don't have the 
necessary context on which documents may (not) make which particular use of 
these key words across IETF areas. Other ADs may know better than I do, so I'd 
prefer to eventually defer. I appreciate the learning opportunity for when I'll 
be serving as an editor in the future!

First, RFC 2119 Section 6 (and the IESG statement [1]) mandates to only use uppercase key 
words when "actually required for interoperation or to limit behavior which has 
potential for causing harm". The RFC itself details:

    ... potential for causing harm (e.g., limiting retransmisssions)  For
    example, they must not be used to try to impose a particular method
    on implementors where the method is not required for
    interoperability.

In this context, it seems to me that your view (~"30 minutes TTL don't cause an 
interoperability issue, so uppercase is inappropriate") is not compelling. For 
example, a higher TTL does have a slight risk of causing retransmissions, namely when the 
new DS record set is botched and causes SERVFAILs on lookups, which are not cacheable and 
will cause look-up retries. When the DS TTL is set to 5-15 minutes instead of 30, there 
will be less retries, as a correction to the DS RRset will be visible more quickly. The 
DNSOP WG has determined that 5-15 is the best, albeit not the only choice to address this 
risk. Yet, there is no interoperability issue when deviating a bit.

The IESG statement [1] also has a section on "Common Additional Uses of Key Words" which 
endorses use of uppercase key words for operational requirements, and the 5-15' recommendation 
arguably is an operational (soft) requirement and not just an opinion (that is, "it's best 
done this way").

I will defer to your expertise regarding this being an operational requirement, 
but it is not clear to me that it is from the description. If it is an 
operational requirement, then whey can't it be a MUST? If it is to be a SHOULD 
then what are the ramifications of violating the SHOULD? Let me offer some text:

    Parent-side entities (such as registries) SHOULD reduce a DS
    record set's TTL to a value between 5–15 minutes when a new set
    of records is published. Using values below 5 minutes risks
    excessive queries, and using values greater than 15 minutes may impact
    recovery from operational mistakes. The parent-side SHOULD restore
    the previous (or, if unavailable, a default) TTL value at a later
    occasion (but not before the previous DS RRset's TTL has expired)
    to prevent operational issues arising within the client-side processes.

Hopefully I got the technical consequences correct.

You got them perfectly correct :-) And I like the words you added.

The document's structure is that all recommendations are given in sections *.1 
(and consolidated in Appendix A), whereas considerations around the why and 
what-if-not are all in sections *.2. The currently proposed text conflates 
that, for a single recommendation only.

I've thus taken your words and put them in the corresponding analysis section 
4.2:

OLD
   Registries therefore should significantly lower the DS RRset's TTL
   for some time following bootstrapping or an update.  Pragmatic values
   for the reduced TTL value range between 5–15 minutes.  Such low TTLs
   might be expected to cause increased load on the corresponding
   authoritative nameservers; however, recent measurements have
   demonstrated them to have negligible impact on the overall load of a
   registry's authoritative nameserver infrastructure [LowTTL].

NEW
   Registries therefore should significantly lower the DS RRset's TTL
   for some time following bootstrapping or an update.  Pragmatic values
   for the reduced TTL value range between 5–15 minutes.  Using values
   below 5 minutes risks excessive queries, and using values greater
   than 15 minutes may impact recovery from operational mistakes.

   Note that recent measurements have demonstrated low TTLs like the
   above to have negligible impact on the overall load of a registry's
   authoritative nameserver infrastructure [LowTTL].

To reiterate, this is my understanding, but I may not have the full picture 
across IETF areas.

[1]: 
https://datatracker.ietf.org/doc/statement-iesg-statement-on-clarifying-the-use-of-bcp-14-key-words/


Our discussion made me reconsider an earlier change I had confirmed in 
https://mailarchive.ietf.org/arch/msg/dnsop/bO0CX4FPyS907p0HZLF5Xilxbdo/, 
namely that the reporting recommendations in Section 5.1 should be lowercased.

That was based on the insight that, as you pointed out, reporting is not an 
interoperability requirement. However, the "Common Additional Uses of Key 
Words" section of [1] explicitly lists:

     - Operational requirements, especially around mandatory logging
       and configuration needed to produce successful deployments

It seems to me that reporting is pretty much the same as logging, except to a 
different party (namely, the child operator, the registrant and the like). The 
guidance in the IESG's statement is not limited to actual logging, it only 
names it especially, yet as an example. Section 5.1 thus appears appropriately 
covered by this clause. As a result, I've reverted this change.

As above, I'm not objecting to making any changes if I'm in the rough; I just 
don't know whether I am.

If your opinion is that a successful deployment requires reporting then it does 
sound like an operational requirement. Then the question is MUST or SHOULD. If 
SHOULD, then can the consequences of not following that SHOULD be described?

Perhaps the following text before the numbered entries in 5.1 helps: "One or more of 
the reporting methods described below MUST be implemented."

The different reporting methods address different aspects, so it's not like 
implementing one mitigates the adverse effects of not implementing another. 
Rather, they are all very useful for debugging purposes, to prevent confusion 
etc. (like logging, so it's covered by BCP 14), but they are not crucial in 
order to make DS automation work when everything runs smoothly. It's thus very 
reasonable to implement these, but MUST seems too much.

We could add that changes and/or errors may go unnoticed if reporting is skipped, and 
then whatever consequence is imaginable from that may ensue. However, that's quite 
obvious, and it doesn't help the reader to add text that's not meaningful. The analysis 
section on reporting (5.2) contains a lot of reasoning about the topic, and you may 
regard that whole section as "qualifying the SHOULD".

Note that the IESG statement on BCP 14 does not request that "SHOULD exceptions & consequences" we 
explained immediately adjacent to where the key word is used. It only points out that providing "readers 
with all of the details they need to make an informed decision" is more "valuable" than not doing 
so. I really think that the interplay of sections 5.1 and 5.2 (mirroring the general structure of the document) 
is suitable for that purpose.

#### Both CDNSKEY and CDS

246        3.  DNS operators SHOULD publish both CDNSKEY and CDS records, and
247            follow best practice for the choice of hash digest type
248            [DS-IANA].

Section 4.2.3 does a good job of explaining why both CDNSKEY and CDS are
needed, so it seems the justification here is a MUST. In other words, if you
want to interoperate, the operator MUST do this otherwise there is likely to be
a problem.

There is no problem if the child operator knows which type of update format 
(CDS or CDNSKEY) the parent consumes. There would only be a problem if the 
parent insisted on both being published. Indeed, that provision was part of the 
draft, but was removed in -01 due to argument that it's not a good idea to 
reject an update just because a redundant format is missing.

A written record of this argument can be found at 
https://mailarchive.ietf.org/arch/msg/dnsop/ObpPwt5_HrmsPXE3dG8CrJUP9mg/; more 
feedback in that direction was gathered during an in-person workshop at DNS 
OARC 45.

Given these points, it seems not generally required for interoperability to 
publish both CDS and CDNSKEY, although if you do so you can be agnostic about 
the parent's preference and avoid certain risks. But, if you know what you're 
doing and what the parent expects, the risk goes away. Hence the SHOULD.

See also 
https://mailarchive.ietf.org/arch/msg/dnsop/Uv-oyqj-gp1dlfPIofct22Ijqo0/.

Why can't the guidance be more explicit? "Unless the child and parent have agreed 
upon using either CDS or CDNSKEY, DNS operators MUST publish both CDNSKEY and CDS."

Such agreement is not to be expected in practice, as domain registries generally do not 
have any sort of agreement with DNS child operators. There may be rare exceptions (which 
"SHOULD" would accommodate), but the registry's preferred ingestion format is 
normally published in their DNSSEC Practice Statement.

Additionally, such agreements are something to actually advise against, because they 
ossify expectations and prevent the registry to change their preferred format later. If 
either CDS or CDNSKEY are eventually deprecated (and removing this dichotomy would be a 
good thing), at least some registries would have to make that change, and agreements like 
the above would stand in the way. (That's why my point earlier was on the child operator 
"knowing" the preference, such as from a DNSSEC Practice Statement, not the 
registry committing to it in an agreement.)

Is your objection that once an agreement is reached it will not change? Or that 
it is hard to change?

Mainly the latter; there's always a long tail. We therefore should not 
encourage them (and the associated ossification).

I guess the onus can be placed on the child entirely: "Unless the child has 
knowledge of the parent's preference, both CDNSKEY and CDS MUST be published."

Brilliant! Sometimes solutions are so obvious when you see them. Reworded 
slightly (so that the acting party is at the beginning of the sentence for 
catchiness):

OLD
   3.  DNS operators SHOULD publish both CDNSKEY and CDS records, and
       follow best practice for the choice of hash digest type
       [DS-IANA].

NEW
   3.  DNS operators MUST publish both CDNSKEY and CDS records (unless
       the parent's preference is known), and follow best practice for
       the choice of hash digest type [DS-IANA].

----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------
[...]
### RDAP TTLs

410        5.  The currently active DS configuration SHOULD be made accessible
411            to the registrant (or their designated party) through the
412            customer portal available for domain management.  The DS update
413            history MAY be made available in the same way.

https://datatracker.ietf.org/doc/draft-ietf-regext-rdap-ttl-extension/ is also
a good way to make the currently active TTL values available to the registrant
or their designated party.

I will make a note in Section 5.2. My impression is that publicly stating the 
TTL policy would belong into DNSSEC Practice Statements, so I'd be reluctant to 
include it as a recommendation.

I agree. That is only informative, as is all of item 5.
I beg to disagree: As mentioned above, uppercase key words are fine "to limit 
behavior which has potential for causing harm". Operational harm is certainly 
imaginable when a registrant can't inspect their current configuration after it has been 
changed through DS automation, and the registrant wrongly assumes a different state while 
operating their nameserver (I'll be happy to come up with some specific instances where 
that could play a role).

I agree. Can that be put into the draft? "Failure to provide the registrant a means 
to inspect their current configuration after it has been changed may cause the inability 
of the registrant to recover from operational incidents because the registrant may have 
out-of-date information."

Sounds good. I've made the following change:

OLD
   The registrant (or their designated party) should be able to retrieve
   the current DS configuration through the customer portal available
   for domain management.

NEW
   The registrant (or their designated party) should be able to retrieve
   the current DS configuration through the customer portal available
   for domain management.  Failure to provide the registrant a means to
   inspect the current configuration after it has been changed may
   hinder recovery from operational incidents because the registrant may
   have out-of-date information.

Best,
Peter

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to