[DNSOP] Re: Andy Newton's Discuss on draft-ietf-dnsop-ds-automation-08: (with DISCUSS and COMMENT)

Andy Newton Wed, 20 May 2026 15:27:02 -0700


On 20-05-2026 12:45 PM, Peter Thomassen wrote:


The rendered diff is currently visible at 
https://author-tools.ietf.org/iddiff?url1=draft-ietf-dnsop-ds-automation-08&url2=https://desec-io.github.io/draft-ietf-dnsop-ds-automation/draft-ietf-dnsop-ds-automation.txt.


Thanks for that. Very helpful.


On 5/20/26 15:38, Andy Newton wrote:

#### 5-15 Minutes

240        2.  Parent-side entities (such as registries) SHOULD reduce a DS
241            record set's TTL to a value between 5–15 minutes when a new set
242            of records is published, and restore the normal TTL value at a
243            later occasion (but not before the previous DS RRset's TTL has
244            expired).

Why isn't this a MUST? It is a SHOULD for both behaviors (the 5-15 minute TTL)
and the restoration of the TTL. At the very least, it seems the restoration of
the "normal" TTL value is a MUST. However, why isn't the 5-15 minute value a
MUST? Otherwise an operator can set it to 5 years justified with "because I
just wanted it that way". If the SHOULD is to remain, isn't there a reasonable
upper boundary behavior such as not being greater than the normal TTL value?


Two thoughts:

- TTLs accepted by resolvers are bounded in practice. Most do not accept 
arbitrarily long TTLs, as that has other security issues such as retaining a 
hijacked delegation beyond its actual removal. I can collect numbers if needed, 
but my recollection is that TTLs over 1-2d are not honored in practice.

- The above resolver-side constraint aside, parents can set whatever TTLs they 
want, even without DS automation. DS automation does not change that. The point 
of this provision in the document is to mitigate the impact of a botched DS 
update, not to give general recommendations about what TTLs should be used in a 
TLD registry. (In practice, TTLs in TLD zones vary between 1h and 48h.)

- It's not a MUST because there is no problem if the registry uses a TTL of 20 
or 30 minutes permanently, for example. Also, when DS automation is performed 
by the registrar, the registrar may not have the ability to enforce TTL 
adjustment. -- Now one could say that the document should take the stance that 
DS automation is to be done by registries and not registrars. However, getting 
into that discussion was very very clearly rejected by many participants: the 
registries and registrars want to sort out these responsibilities between 
themselves. And indeed, it's a business decision, not primarily a technical one.


If there are no interoperability issues with the registry setting 30 minutes as 
the TTL value, which is outside the 5 to 15 minute suggestion, then this is not 
a normative requirement and the SHOULD is to be lower-cased and/or changed to 
use non-BCP14 language.

Taking the DISCUSS seriously, please excuse me for mounting a discussion ;-) I
will lay out my understanding but eventually defer as I don't have the
necessary context on which documents may (not) make which particular use of
these key words across IETF areas. Other ADs may know better than I do, so I'd
prefer to eventually defer. I appreciate the learning opportunity for when I'll
be serving as an editor in the future!

First, RFC 2119 Section 6 (and the IESG statement [1]) mandates to only use uppercase key
words when "actually required for interoperation or to limit behavior which has
potential for causing harm". The RFC itself details:

... potential for causing harm (e.g., limiting retransmisssions) For
example, they must not be used to try to impose a particular method
on implementors where the method is not required for
interoperability.

In this context, it seems to me that your view (~"30 minutes TTL don't cause an
interoperability issue, so uppercase is inappropriate") is not compelling. For
example, a higher TTL does have a slight risk of causing retransmissions, namely when the
new DS record set is botched and causes SERVFAILs on lookups, which are not cacheable and
will cause look-up retries. When the DS TTL is set to 5-15 minutes instead of 30, there
will be less retries, as a correction to the DS RRset will be visible more quickly. The
DNSOP WG has determined that 5-15 is the best, albeit not the only choice to address this
risk. Yet, there is no interoperability issue when deviating a bit.

The IESG statement [1] also has a section on "Common Additional Uses of Key Words" which
endorses use of uppercase key words for operational requirements, and the 5-15' recommendation
arguably is an operational (soft) requirement and not just an opinion (that is, "it's best
done this way").


I will defer to your expertise regarding this being an operational requirement, 
but it is not clear to me that it is from the description. If it is an 
operational requirement, then whey can't it be a MUST? If it is to be a SHOULD 
then what are the ramifications of violating the SHOULD? Let me offer some text:

   Parent-side entities (such as registries) SHOULD reduce a DS
   record set's TTL to a value between 5–15 minutes when a new set
   of records is published. Using values below 5 minutes risks
   excessive queries, and using values greater than 15 minutes may impact
   recovery from operational mistakes. The parent-side SHOULD restore
   the previous (or, if unavailable, a default) TTL value at a later
   occasion (but not before the previous DS RRset's TTL has expired)
   to prevent operational issues arising within the client-side processes.

Hopefully I got the technical consequences correct.


To reiterate, this is my understanding, but I may not have the full picture 
across IETF areas.

[1]: 
https://datatracker.ietf.org/doc/statement-iesg-statement-on-clarifying-the-use-of-bcp-14-key-words/


Our discussion made me reconsider an earlier change I had confirmed in 
https://mailarchive.ietf.org/arch/msg/dnsop/bO0CX4FPyS907p0HZLF5Xilxbdo/, 
namely that the reporting recommendations in Section 5.1 should be lowercased.

That was based on the insight that, as you pointed out, reporting is not an 
interoperability requirement. However, the "Common Additional Uses of Key 
Words" section of [1] explicitly lists:

     - Operational requirements, especially around mandatory logging
       and configuration needed to produce successful deployments

It seems to me that reporting is pretty much the same as logging, except to a 
different party (namely, the child operator, the registrant and the like). The 
guidance in the IESG's statement is not limited to actual logging, it only 
names it especially, yet as an example. Section 5.1 thus appears appropriately 
covered by this clause. As a result, I've reverted this change.

As above, I'm not objecting to making any changes if I'm in the rough; I just 
don't know whether I am.


If your opinion is that a successful deployment requires reporting then it does 
sound like an operational requirement. Then the question is MUST or SHOULD. If 
SHOULD, then can the consequences of not following that SHOULD be described?

Perhaps the following text before the numbered entries in 5.1 helps: "One or more of 
the reporting methods described below MUST be implemented."

Also, is "normal TTL value" the previous TTL value? If not, what does that mean?


The "normal TTL value" is indeed the "previous" one when an update is applied. 
However, for DS initialization, there is no previous one, so the term would only capture updates.

Happy to change to a better adjective if it's not clear (pls let me know if you have 
another suggestion). OTOH, this was widely reviewed in the DNS and registry space (e.g., 
CENTR, APTLD, DNS OARC) and it seems like the use of "normal" was generally 
clear to the intended audience.


Some could have interpreted it as the previous value and others the default value by 
policy. I think it is best to be explicit. Perhaps "previous TTL value or default 
TTL value absent a previous value".


Sounds great. I've noted down the following change for the next revision:

OLD
    restore the normal TTL value

NEW
    restore the previous (or, if unavailable, default) TTL value


Yes, that is excellent.

#### Both CDNSKEY and CDS

246        3.  DNS operators SHOULD publish both CDNSKEY and CDS records, and
247            follow best practice for the choice of hash digest type
248            [DS-IANA].

Section 4.2.3 does a good job of explaining why both CDNSKEY and CDS are
needed, so it seems the justification here is a MUST. In other words, if you
want to interoperate, the operator MUST do this otherwise there is likely to be
a problem.


There is no problem if the child operator knows which type of update format 
(CDS or CDNSKEY) the parent consumes. There would only be a problem if the 
parent insisted on both being published. Indeed, that provision was part of the 
draft, but was removed in -01 due to argument that it's not a good idea to 
reject an update just because a redundant format is missing.

A written record of this argument can be found at 
https://mailarchive.ietf.org/arch/msg/dnsop/ObpPwt5_HrmsPXE3dG8CrJUP9mg/; more 
feedback in that direction was gathered during an in-person workshop at DNS 
OARC 45.

Given these points, it seems not generally required for interoperability to 
publish both CDS and CDNSKEY, although if you do so you can be agnostic about 
the parent's preference and avoid certain risks. But, if you know what you're 
doing and what the parent expects, the risk goes away. Hence the SHOULD.

See also 
https://mailarchive.ietf.org/arch/msg/dnsop/Uv-oyqj-gp1dlfPIofct22Ijqo0/.


Why can't the guidance be more explicit? "Unless the child and parent have agreed 
upon using either CDS or CDNSKEY, DNS operators MUST publish both CDNSKEY and CDS."


Such agreement is not to be expected in practice, as domain registries generally do not 
have any sort of agreement with DNS child operators. There may be rare exceptions (which 
"SHOULD" would accommodate), but the registry's preferred ingestion format is 
normally published in their DNSSEC Practice Statement.

Additionally, such agreements are something to actually advise against, because they 
ossify expectations and prevent the registry to change their preferred format later. If 
either CDS or CDNSKEY are eventually deprecated (and removing this dichotomy would be a 
good thing), at least some registries would have to make that change, and agreements like 
the above would stand in the way. (That's why my point earlier was on the child operator 
"knowing" the preference, such as from a DNSSEC Practice Statement, not the 
registry committing to it in an agreement.)


Is your objection that once an agreement is reached it will not change? Or that 
it is hard to change?

I guess the onus can be placed on the child entirely: "Unless the child has 
knowledge of the parent's preference, both CDNSKEY and CDS MUST be published."


Aren't the risks a child operator might be incurring covered by the sentence 
you proposed in 
https://mailarchive.ietf.org/arch/msg/dnsop/8hgHnyVsxVYSyGcn6oBUJT9GGzo/? I've 
already added this for the next revision:

    When implementing these recommendations, operators MUST mitigate
    issues arising from any particular deviation.


In general, yes. But we should be more helpful if we can.

----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

[...]

### RDAP TTLs

410        5.  The currently active DS configuration SHOULD be made accessible
411            to the registrant (or their designated party) through the
412            customer portal available for domain management.  The DS update
413            history MAY be made available in the same way.

https://datatracker.ietf.org/doc/draft-ietf-regext-rdap-ttl-extension/ is also
a good way to make the currently active TTL values available to the registrant
or their designated party.


I will make a note in Section 5.2. My impression is that publicly stating the 
TTL policy would belong into DNSSEC Practice Statements, so I'd be reluctant to 
include it as a recommendation.


I agree. That is only informative, as is all of item 5.

I beg to disagree: As mentioned above, uppercase key words are fine "to limit 
behavior which has potential for causing harm". Operational harm is certainly 
imaginable when a registrant can't inspect their current configuration after it has been 
changed through DS automation, and the registrant wrongly assumes a different state while 
operating their nameserver (I'll be happy to come up with some specific instances where 
that could play a role).


I agree. Can that be put into the draft? "Failure to provide the registrant a means 
to inspect their current configuration after it has been changed may cause the inability 
of the registrant to recover from operational incidents because the registrant may have 
out-of-date information."

-andy, ART AD

_______________________________________________
DNSOP mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[DNSOP] Re: Andy Newton's Discuss on draft-ietf-dnsop-ds-automation-08: (with DISCUSS and COMMENT)

Reply via email to