[dnsop] Re: comments on dnsop draft

Olaf M. Kolkman Wed, 08 Jun 2005 05:01:23 -0700

On Mon, 6 Jun 2005 16:39:46 -0400
Edward Lewis <[EMAIL PROTECTED]> wrote:


> Context from the doc starts w/ "#", comments in-line...

Hi Ed,

Thanks a lot for the comments. 

I've skipped all style and spelling comments. And make further
comments in-line. Wherever I could I suggested alternative text. 

I've tried to separate issues by horizontal lines.

Since I would really like to have this document out of our way before
the next IETF I would appreciate review and input from the WG on the
questions raised below.

If we do not get an answer on the questions raised we will probably
use whatever we suggested below. After we've update the document we'll
ask for yet another WG last call.



--Olaf


----------------------------------------------------------------------
> 
> #1.2  Time Definitions
> 
> #   o  "Key effectivity period"
> #         The period which a key pair is expected to be effective.  This
> #         period is defined as the time between the first inception time
> #         stamp and the last expiration date of any signature made with
> #         this key.
> #         The key effectivity period can span multiple signature validity
> #         periods.
> 
> Can it be discontinuous?  I.e., only on Tuesdays and Thursdays in May?

I do not know what the "formal definition" is. 

But I take it to be that even when signatures made with this key
appear with a sig-inception Tuesday 00:00 and sig-expiration Tuesday
23:59 and signatures with sig-inception Thursday 00:00 and
sig-expiration Thursday 23:59, the Key effectivitiy period would be
Tuesday 00:00 to Thursday 23:49.

If there is a better way of phrasing this definition as appears now I
will need text.

----------------------------------------------------------------------


>> #   o  "Maximum/Minimum Zone TTL"
> #         The maximum or minimum value of the TTLs from the complete set
> #         of RRs in a zone.
> 
> This has nothing to do with the last number in the SOA RR, right?  Right?
>

Correct you take all the TTL values in the zone and take the minimum
and the maximum of that set. Is the text that is there now ambiguous?


----------------------------------------------------------------------

 
> #2.  Keeping the Chain of Trust Intact
> 
> #   For the verifying clients it is important that data from secured
> #   zones can be used to build chains of trust regardless of whether the
> #   data came directly from an authoritative server, a caching nameserver
> #   or some middle box.  Only by carefully using the available timing
> #   parameters can a zone administrator assure that the data necessary
> #   for verification can be obtained.
> 
> I don't think that the last sentence is right.  If an admin is unaware of the
> timing parameters, data will only be delayed. 

Let me try to reprhasese that last sentence.

Only by carefully timing their actions zone administrators can
assure that the data necessary for verification can be obtained by
validating clients.



----------------------------------------------------------------------

> It might be good to note that
> the time from master to slave is negligible when using NOTIFY and IXFR,
> increasing by reliance on AXFR, and more if you rely on the SOA timing
> parameters for zone refresh.  (Non-standard means of zone transfers have
> other timing concerns.)  When it comes to freshness of data within caches,
> the TTL is the only pertinent parameter, with a shorter setting increasing
> freshness at the cost of fewer cache "hits."

I propose to modify the 3rd paragraph of section 2 by appending your 
text:

   Administrators of secured zones will have to keep in mind that data
   published on an authoritative primary server will not be
   immediately seen by verifying clients; it may take some time for
   the data to be transfered to other secondary authoritative
   nameservers and clients may be fetching data from caching
   non-authoritative servers. In this light ist is good to note that
   the time from master to slave is negligible when using NOTIFY and
   IXFR, increasing by reliance on AXFR, and more if you rely on the
   SOA timing parameters for zone refresh.


----------------------------------------------------------------------

> 
> I don't know what detail you want to include, but you should mention the
> sliding scale of performance between master and slave, and a note on what
> parameter(s) effect cache performance.

Is the detail in the above paragraph sufficient? If not could you help
us out with suggestions for text? 



----------------------------------------------------------------------

> 
> #3.1  Zone and Key Signing Keys
> #
> #   The DNSSEC validation protocol does not distinguish between DNSKEYs.
> #   All DNSKEYs can be used during the validation.  In practice operators
> 
> Is that true?  Is that because DNSKEYs must have the zone key bit set?
> I forget how we resolved TKEY stuff the type code roll.

What about:

 The DNSSEC validation protocol does not distinguish between DNSKEYs
 with the SEP flag set or cleared. Both DNSKEYs can be used during the
 validation. ...


----------------------------------------------------------------------

> 
> #3.3  Key Effectivity Period
> 
> #   For Key Signing Keys a reasonable key effectivity period is 13
> #   months, with the intent to replace them after 12 months.  An intended
> #   key effectivity period of a month is reasonable for Zone Signing
> #   Keys.
> 
> Shouldn't this be linked to some minimum size of the key?
> 


What about prepending a line to the above paragraph introducing a new
one sentence paragraph directly after that:

+  From a purely operational perspective a reasonable key effectivity
+  period for Key Signing Keys is 13 months, with the intent to
   replace them after 12 months.  An intended key effectivity period
   of a month is reasonable for Zone Signing Keys.

+  For a key-size that matches these effectivity periods see section 3.5

   Using these recommendations will lead to rollovers occurring
   frequently enough to become part of 'operational habits'; the
   procedure does not have to be reinvented every time a key is
   replaced.

   Key effectivity periods can be made very short, as in the order of a
   few minutes.  But when replacing keys one has to take the
   considerations from Section 4.1 and Section 4.2 into account.


----------------------------------------------------------------------

> 
> #3.6  Private Key Storage
> #
> #   It is recommended that, where possible, zone private keys and the
> #   zone file master copy be kept and used in off-line, non-network
> #   connected, physically secure machines only.  Periodically an
> #   application can be run to add authentication to a zone by adding
> #   RRSIG and NSEC RRs.  Then the augmented file can be transferred,
> 
> The problem with this recommendation is that a lot of the upper (sensitive)
> zones have a "response time" pressure that pushes them to use dynamic update.
> Currently, dynamic update (tools) don't allow the inclusion of signatures,
> this might have to be fixed.  Left in limbo is the recommendation to
> keep keys entirely off-line.

I understand your observation but I find it difficult to turn this
into document text. Do you have a suggestion?


> #   perhaps by sneaker-net, to the networked zone primary server machine.
> 
> It's ironic that this basic tenet of the DNSSEC world is somewhat out of
> whack with what are labeled as the most sensitive zones.

The irony is not intentional. How can we improve?


----------------------------------------------------------------------


> #4.1.1  Time Considerations
> #
> 
> #   o  We suggest the signature publication period to be at least one
> #      maximum TTL smaller than the signature validity period.
> #         Resigning a zone shortly before the end of the signature
> #         validity period may cause simultaneous expiration of data from
> #         caches.  This in turn may lead to peaks in the load on
> #         authoritative servers.
> 
> This is confusing.
> 
> Are you suggesting that the publication period of a signature end at least
> one maximum TTL duration before the end of the signature's validity period?

Yes, suggested rephrase:
                                   
 o   We suggest the publication period of a signature end at least one
     maximum TTL duration before the end of the signature's validity
     period.

         Resigning a zone shortly before the end of the signature
         validity period may cause simultaneous expiration of data
         from caches.  This in turn may lead to peaks in the load on 
         authoritative servers.

----------------------------------------------------------------------


> 
> #   o  We suggest the minimum zone TTL to be long enough to both fetch
> #      and verify all the RRs in the authentication chain.  A low TTL
> #      could cause two problems:
> #         1.  During validation, some data may expire before the
> #         validation is complete.  The validator should be able to keep
> #         all data, until is completed.  This applies to all RRs needed
> #         to complete the chain of trust: DSs, DNSKEYs, RRSIGs, and the
> #         final answers i.e. the RR set that is returned for the initial
> #         query.
> #         2.  Frequent verification causes load on recursive nameservers.
> #         Data at delegation points, DSs, DNSKEYs and RRSIGs benefit from
> #         caching.  The TTL on those should be relatively long.
> 
> A low TTL has been demonstrated in workshops to be detrimental.  (Not a
> "could.")  Even in a close-in workshop, TTL's of under 5 or 10 minuted
> disrupted operations.  In the wide Internet, the floor of the TTL will
> have to be much higher.

Do we have a reference to the minutes of these workshops? If not I
propose to start the above paragraph with:


  o We suggest the minimum zone TTL to be long enough to both fetch
    and verify all the RRs in the authentication chain.  In workshop
    environments it has been demonstrated [E.Lewis: private
    communication] that a low TTL (under 5 to 10 minutes) caused
    disruptions because of the following two problems:




----------------------------------------------------------------------

> 
> #         When a slave server is out of sync with its master and data in
> #         a zone is signed by expired signatures it may be better for the
> #         slave server not to give out any answer.
> 
> #         We suggest the SOA expiration timer being approximately one
> #         third or one fourth of the signature validity period.  It will
> #         allow problems with transfers from the master server to be
> #         noticed before the actual signature time out.
> 
> One wording choice I noticed - "smaller" rather than "shorter."  When we
> are talking time durations, "longer" and "shorter" are more appropriate.
> 
> I agree with the recommendation here, but I am not sure about the build up.
> I think that a slave ought to continue to serve up RRSIGs whose time has
> passed in the face of having lost contact with the master.  For two reasons,
> one is that the clock on the slave might be wrong and the other is that
> resolvers might be willing to accept past-due data or are completely ignoring
> DNSSEC.

But it will cause a black out for part of the clients that pull from
that "SOA timed out" server if the do not ignore DNSSEC and do not
ignore signature validity time.  Lameness is probably better than
complete blackouts.

We use the "it may be better" consciously.

Unless there are objections or alternative text I intend to keep the
text as is.


----------------------------------------------------------------------

> 
> #4.2.1.1  Pre-publish key set Rollover
> #
> 
> #    normal          pre-roll         roll            after
> #
> #    SOA0            SOA1             SOA2            SOA3
> #    RRSIG10(SOA0)   RRSIG10(SOA1)    RRSIG11(SOA2)   RRSIG11(SOA3)
> #
> #    DNSKEY1         DNSKEY1          DNSKEY1         DNSKEY1
> #    DNSKEY10        DNSKEY10         DNSKEY10        DNSKEY11
> #                    DNSKEY11         DNSKEY11
> #    RRSIG1 (DNSKEY) RRSIG1 (DNSKEY)  RRSIG1(DNSKEY)  RRSIG1 (DNSKEY)
> #    RRSIG10(DNSKEY) RRSIG10(DNSKEY)  RRSIG11(DNSKEY) RRSIG11(DNSKEY)
> 
>                       RRSIG11(DNSKEY)  RRSIG10(DNSKEY)
> 
> Those too?

No.. just as was written.

You introduce the key but you do not sign with it yet. You just allow
the public key to get "introduced" into the caches. 

The public key is published but the key's effectivity period starts
only at the "roll".

I hope this is clear. It is the core of the document.

----------------------------------------------------------------------

> #      DNSKEY 10 is used to sign all the data of the zone, the zone-
> #      signing key.
> #   pre-roll: DNSKEY 11 is introduced into the key set.  Note that no
> #      signatures are generated with this key yet, but this does not
> #      secure against brute force attacks on the public key.  The minimum
> #      duration of this pre-roll phase is the time it takes for the data
> #      to propagate to the authoritative servers plus TTL value of the
> #      key set.  This equates to two times the Maximum Zone TTL.
> 
> Aren't all keys required to sign the key set? 

Only the keys a DS record points to. 


----------------------------------------------------------------------
> 
> #   roll: At the rollover stage (SOA serial 2) DNSKEY 11 is used to sign
> #      the data in the zone exclusively  (i.e. all the signatures from
> #      DNSKEY 10 are removed from the zone).  DNSKEY 10 remains published
> #      in the key set.  This way data that was loaded into caches from
> #      version 1 of the zone can still be verified with key sets fetched
> #      from version 2 of the zone.
> #      The minimum time that the key set including DNSKEY 10 is to be
> #      published is the time that it takes for zone data from the
> #      previous version of the zone to expire from old caches i.e. the
> #      time it takes for this zone to propagate to all authoritative
> #      servers plus the Maximum Zone TTL value of any of the data in the
> #      previous version of the zone.
> 
> Not the maximum TTL, but the TTL of the key set.


You want data that is still cached and signed with signature 10 (as in
version SOA1 of the key) to expire before you remove key10 from
SOA2. That timing is set by the TTL of the zone data not the DNSKEY.

>  (This could be more
> complicated as there would have been a TTL of one week yesterday, then
> shortened to an hour today.)

Agreed... but I think mentioning this would complicate rather than
clarify.

You could actually also mention that one could take the 'maximum'
SIGNATURE expiration time of the data at the roll to determine when
the post-roll should occur. Caches should throw out the data when that
expiration time occurred. Since that provides you with an absolute time
to do the rollover that may actually be a good recommendation to give
too. (also see 4035 section 4.3 "The resolver SHOULD discard the
entire atomic entry when any of the RRs contained in it expire").

Any opinion from the working group?



----------------------------------------------------------------------

> #4.2.1.3  Pros and Cons of the Schemes
> 
> #   Pre-publish-key set rollover: This rollover does not involve signing
> #      the zone data twice.  Instead, before the actual rollover, the new
> #      key is published in the key set and thus available for
> #      cryptanalysis attacks.  A small disadvantage is that this process
> #      requires four steps.  Also the pre-publish scheme involves more
> #      parental work when used for KSK rollovers as explained in
> #      Section 4.2.
> 
> I don't think that cryptanalysis is possible without a signature to go
> along with the public key, however, dictionary attacks are possible.
> (As in "where have I seen this public key before and did I break it?")

The cryptanalysis attack was mentioned in 4.2.1.3, should we remove
that line? 

Your editor needs guidance.

----------------------------------------------------------------------


> #   The scenario above puts the responsibility for maintaining a valid
> #   chain of trust with the child.  It also is based on the premises that
> #   the parent only has one DS RR (per algorithm) per zone.  An
> #   alternative mechanism has been considered.  Using an established
> #   trust relation, the interaction can be performed in-band, and the
> #   removal of the keys by the child can possibly be signaled by the
> #   parent.  In this mechanism there are periods where there are two DS
> #   RRs at the parent.  Since at the moment of writing the protocol for
> #   this interaction has not been developed further discussion is out of
> #   scope for this document.
> 
> Perhaps you should also show the DS set at the parent in the example.
> Later you have one, but it is for the 2 DS at the parent option.

Ack, proposed diagram:

       Parent:
       normal                  between "roll"         
                               and "after"
       SOA0                    SOA3
       RRSIGpar(SOA0)          RRSIGpar(SOA3)
       DS1                     DS2
       RRSIGpar(DS)            RRSIGpar(DS)       


       normal          roll                           after

       SOA0            SOA1                           SOA2
       RRSIG10(SOA0)   RRSIG10(SOA1)                  RRSIG10(SOA2)

       DNSKEY1         DNSKEY1                        DNSKEY2
                       DNSKEY2
       DNSKEY10        DNSKEY10                       DNSKEY10
       RRSIG1 (DNSKEY) RRSIG1 (DNSKEY)                RRSIG2(DNSKEY)
                       RRSIG2 (DNSKEY)
       RRSIG10(DNSKEY) RRSIG10(DNSKEY)                RRSIG10(DNSKEY)


----------------------------------------------------------------------

> 
> #4.2.3  Difference Between ZSK and KSK Rollovers
> #
> #   Note that KSK rollovers and ZSK rollovers are different.  A zone-key
> #   rollover can be handled in two different ways: pre-publish (Section
> #   Section 4.2.1.1) and double signature (Section Section 4.2.1.2).
> 
> They really aren't that different - it's just the interaction with the
> parent and waiting on the parent that is different.  To a KSK, the "entire"
> zone is the DNSKEY set, as opposed to all sets for the ZSK.

Suggestion

   Note that KSK rollovers and ZSK rollovers are slightly different.
                                                 ^^^^^^^^
       
----------------------------------------------------------------------



> 
> #4.3  Planning for Emergency Key Rollover
> #
> #   This section deals with preparation for a possible key compromise.
> #   Our advice is to have a documented procedure ready for when a key
> #   compromise is suspected or confirmed.
> #
> #   When the private material of one of your keys is compromised it can
> #   be used for as long as a valid authentication chain exists.  An
> #   authentication chain remains intact for:
> #   o  as long as a signature over the compromised key in the
> #      authentication chain is valid,
> #   o  as long as a parental DS RR (and signature) points to the
> #      compromised key,
> 
> This is a considerable problem.  A reminder that DS records ought to be
> conservatively signed.


Suggestion:

      o as long as a parental DS RR (and signature) points to the
        compromised key (also see 4.4.4  DS Signature Validity Period)

----------------------------------------------------------------------

> 
> #4.3.1  KSK Compromise
> #
> #   When the KSK has been compromised the parent must be notified as soon
> #   as possible using secure means.  The key set of the zone should be
> #   resigned as soon as possible.  Care must be taken to not break the
> #   authentication chain.  The local zone can only be resigned with the
> #   new KSK after the parent's zone has created and reloaded its zone
> #   with the DS created from the new KSK.  Before this update takes place
> #   it would be best to drop the security status of a zone all together:
> #   the parent removes the DS of the child at the next zone update.
> #   After that the child can be made secure again.
> 
> During any emergency impacting a system, I don't expect the system to
> continue operating smoothly.  As here, if there is a compromised key,
> I don't expect maintaining the authentication chain is a priority.  Two
> things might be reasonable - dropping security and publication of the
> problem via other channels.
> 
> Minimizing an outage is a priority of course, meaning that one key ought
> not cause disruption for sibling domains.
> 
> #
> #   An additional danger of a key compromise is that the compromised key
> #   can be used to facilitate a legitimate DNSKEY/DS and/or nameserver
> #   rollover at the parent.  When that happens the domain can be in
> #   dispute.  An authenticated out of band and secure notify mechanism to
> #   contact a parent is needed in this case.
> 
> It's never wise to secure a system only by using the system's security.

We already suggested alternative text please see:
http://darkwing.uoregon.edu/~llynch/dnsop/msg03461.html

I hope that text is clearer.

----------------------------------------------------------------------

> 
> #4.4.3  Security Lameness
> #
> #   Security Lameness is defined as what happens when a parent has a DS
> #   RR pointing to a non-existing DNSKEY RR.  During key exchange a
> #   parent should make sure that the child's key is actually configured
> #   in the DNS before publishing a DS RR in its zone.  Failure to do so
> #   could cause the child's zone being marked as Bogus.
> 
> I think it is dangerous to suggest that the parent check the health of the
> child.  During key rollover, I think the child ought to be looking to see
> when the parent has changed the DS record before changing the child's DNSKEYs.

Now I am confused. I can imagine you would not like to see the
suggestion that the parent checks the health of the child but you not
having a DNSKEY before the parent publishes a DS would really break
things. It would at least put the "double signature rollover" to the
dustbin and registries would then have to deal with multiple DSs in
their zone.

> If you have the child looking up and the parent looking down, you run the
> risk of a control "loop."  I think the burden ought to be on the child to
> always make sure it is well represented in the parent, to keep the attention
> focused in one direction.  Also, a large delegating parent might waste time
> on the well-run children instead of helping out the needy kids.
> 
> Consider too that a child knows better what it's outage (connectivity)
> situation is (than does the parent), which could account for any "missing"
> keys.


Does changing the "should" into "could" in the above paragraph address
your uneasiness.

  "During key exchange a parent could make sure that the child's key is
   actually configured"


----------------------------------------------------------------------

> 
> #4.4.4  DS Signature Validity Period
> #
> #   Since the DS can be replayed as long as it has a valid signature, a
> #   short signature validity period over the DS minimizes the time a
> #   child is vulnerable in the case of a compromise of the child's
> #   KSK(s).  A signature validity period that is too short introduces the
> #   possibility that a zone is marked Bogus in case of a configuration
> #   error in the signer.  There may not be enough time to fix the
> #   problems before signatures expire.  Something as mundane as operator
> #   unavailability during weekends shows the need for DS signature
> #   validity periods longer than 2 days.  We recommend the minimum for a
> #   DS signature validity period of a few days.

> 
> Weeks.  For a large zone, days are not enough.
> 
> It's not the signing that's a problem, it the management of the registry
> that is.

If the signing is not a problem than I do not understand why the
management of the registry is a problem; the DS RRs that are published
by the parent are not subject to change.

Or is "the management of the registry" not a piece of software but
pieces of bio-ware that sets the minimal values?


----------------------------------------------------------------------
> #   The maximum signature validity period of the DS record depends on how
> #   long child zones are willing to be vulnerable after a key compromise.
> #   Other considerations, such as how often the zone is (re)signed can
> #   also be taken into account.
> #
> #   We consider a signature validity period of around one week to be a
> #   good compromise between the operational constraints of the parent and
> #   minimizing damage for the child.
> 
> One week is not realistic, one month is what to prepare for.  IMHO, 
> putting any timescale in this document might create unrealistic 
> expectations unless the timescale is a necessary piece of the 
> protocol.

Hmmm, the document doesn;t force you to choose this particular compromise.

I do think that putting timescales in is something that the "DNSOP"
group can do even though the timing is not piece of the protocol. In
the document we argue the tradeoff and argue that the timescale is a
compromise.

I do get your point though, you would not like to see this document
being stuffed in your face if you do not meet these
recommendations. Maybe we can address this in the introduction of the
text by adding a line to the first paragraph of the Introduction:


   During workshops and early operational deployment tests, operators
   and system administrators gained experience about operating the DNS
   with security extensions (DNSSEC).  This document translates these
   experiences into a set of practices for zone administrators.  At the
   time of writing, there exists very little experience with DNSSEC in
   production environments; this document should therefore explicitly
+  not be seen as representing 'Best Current Practices'. The intention of
+  this document is to provide guidance, it should not be used to argue
+  that operators violate best practices when they choose not to follow
+  recommendations herein.




> 
> #   In addition to the signature validity period, which sets a lower
> #   bound on the amount of times the zone owner will need to sign the
> #   zone data and which sets an upper bound to the time a child is
> #   vulnerable after key compromise,  there is the TTL value on the DS
> #   RRs.  By lowering the TTL, the authoritative servers will see more
> #   queries, on the other hand a low TTL increases the speed with which
> #   new DS RRs propagate through the DNS.  As argued in Section 4.1.1,
> #   the TTL should be a fraction of the signature validity period.
> 
> A lower TTL doesn't really increase "the speed with which new DS RRs
> propagate through the DNS."  What is true is that it "lowers the persistence
> of DS RRSets in caches, forcing more queries to the authoritative servers."


How about:

  By lowering the TTL, the authoritative servers will see more
  queries, on the other hand a low TTL lowers the persistence of old
  DS RRSets in caches thereby increases the speed with which new DS
  RRs propagate through the DNS.

----------------------------------------------------------------------
> 
> #Appendix A.  Terminology
> 
> #   Secure Entry Point key or SEP Key: A KSK that has a parental DS
> #      record pointing to it.  Note: this is not enforced in the
> #      protocol.  A SEP Key with no parental DS is security lame.


Yess.. this looks weird ... the last sentence is just wrong... 

How about just:

  Secure Entry Point key or SEP Key: A KSK that has a parental DS
      record pointing to it or is configured as a trust-anchor Note:
      this is not enforced in the protocol.


----------------------------------------------------------------------


-- Olaf

---------------------------------| Olaf M. Kolkman
---------------------------------| RIPE NCC

.
dnsop resources:_____________________________________________________
web user interface: http://darkwing.uoregon.edu/~llynch/dnsop.html
mhonarc archive: http://darkwing.uoregon.edu/~llynch/dnsop/index.html

[dnsop] Re: comments on dnsop draft

Reply via email to