Re: [DNSOP] Responding to MSJ review of the previous rfc5011-security-considerations

Michael StJohns Fri, 29 Sep 2017 11:50:18 -0700

Comments on -05 and your changes in line.

On 9/13/2017 1:01 PM, Wes Hardaker wrote:

Mike, after your lengthy last review I went through and carefully made
sure each of your comments were considered.  Most resulted in changes, a
few seemed to be just comments and there was nothing to do, and two we
didn't think were correct.  Below is the summary of the changes in the
most recent version based on your review from July.


Wes Hardaker


Table of Contents
_________________

1 Notes / Edits from Mike StJohn on <2017-07-06 Thu>
.. 1.1 ACCEPTED Use "exclusively" rather than "only" where you can - "only" has
.. 1.2 ACCEPTED In the introduction  - "validators can update" vs "validators 
can
.. 1.3 ACCEPTED In the introduction - same issue with "only recently"
.. 1.4 ACCEPTED In the introduction - "ceasing publication of a revoked key" vs
.. 1.5 FIXED Section 1.1 - What is the definition of "rolled"?   Your use of it
.. 1.6 FIXED Section 2 "which apply to" vs "as required from".
.. 1.7 FIXED Section 2 "Note that it does not..." - which "it" are you talking 
about?
.. 1.8 ACCEPTED Section 2 "using only recently" - same issue as above.
.. 1.9 ACCEPTED Section 2. New para at "Failure of a DNSKEY..."
.. 1.10 ACCEPTED Section 2. "... leaving that key in their trust anchor storage
.. 1.11 ACCEPTED Section 3: Attacker.  "An entity intent..." vs "An attacker
.. 1.12 FIXED Section 4.1:  This doesn't actually describe what's in 5011 -
.. 1.13 ACCEPTED Section 4.1, bullet 3:  Same problem with "only".  Instead 
"Begin
.. 1.14 REJECTED Section 4.2 last para:  This is only an attack if the private 
key
.. 1.15 REJECTED Section 6 - general comment.   You're doing interval 
calculations
.. 1.16 FIXED Section 5.1.1 - You're missing a *very* important point here -
.. 1.17 NOTHINGTODO Section 5.1.1 doesn't actually apply if you use the 5011 
rollover
.. 1.18 FIXED Section 5.1.1, T+35:  "since the hold down time of 30 days + 1/2
.. 1.19 NOTHINGTODO Section 6 - the formulas are wrong.  I also  don't 
understand
.. 1.20 NOTHINGTODO The final formula should be:
.. 1.21 NOTHINGTODO In any event, the point needs to be made that this attack
.. 1.22 FIXED The value in 6 regardless of what it is is the wrong value for

<<< skipped lots of stuff that's OK >>>



1.12 FIXED Section 4.1:  This doesn't actually describe what's in 5011 -
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   specifically bullet 3 was modified to clarify your concerns, but isn't
   what was in 5011.  You need to have both here.

   + Result: Intro paragraph added to note we're paraphrasing 5011 at a
     very high level only.


Nope - not fixed.  You have "This document discusses the following
scenario, which is one of many possible combinations of operations

defined in Section 6 of RFC5011:" followed by "3. Begin to exclusivelyuse recently published DNSKEYs to sign the

appropriate resource records."

5011 does not define this as a possible operation.

Instead: "This document discusses the following scenario which is acombination of the operations of sections 6.1 and 6.2 but ignores theguidance of the first paragraph of section 6 which recommends at leasttwo keys per trust point at all times."






1.14 REJECTED Section 4.2 last para:  This is only an attack if the private key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   is compromised.  In which case, with only a steady state of a single
   key, you've got lots of other problems.  Basically, in your one in one
   out with a steady state of one model, once the current private key is
   compromised you have no ability to fix the problem.  But getting the
   numbers for figuring out when this change becomes "sticky" correct is
   useful.

   + Result: I disagree.  The attack is not just about reusing the stale
     key beyond it's life time.  The attack in this document describes
     the ability to affect the state of the validator so that it's state
     doesn't match the desired state of the Trust Anchor Publisher.
     You're right, that having that state be out of sync isn't useful to
     an attacker until they can break the key for the trust anchor.  But
     an attacker performing this "old state" attack means they have years
     and years to potentially break the key and introduce fake data into
     the dns zone years potentially years the future.  But in the end,
     this *is* an attack against the state of the validator, just not of
     the severity you allude to above.

Strawman alert.

Um... OK. If you get to this point then you've (the attacker has) gotto a) identify those resolvers which have the old key, and b) BREAK THEKEY and c) figure out how a 40 year old computer is still working and d)was NEVER touched by an administrator in all that time.... given that ittook you 40 years and $100s of Millions of dollars to break the key.


*sheesh*



1.15 REJECTED Section 6 - general comment.   You're doing interval calculations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   and you want to do date/time wall clock calculations instead.  While
   the 5011 values are based on intervals from the RRs get to the
   validator, the publisher has to be looking at absolute times first
   (e.g. signature inception and expiration, original TTL) and then
   deltas from those. [Note, this is NOT a new comment - I've made it
   previously and strongly]

   + Result: You've failed to convince me the text needs to change.
     Please propose exact wording (or at least an example) that you feel
     better serves the purpose.  The wall clock and the interval are
     mutatable between each other.  We are calculating the interval to
     wait beyond the publication point (waitTime) which is the same as
     wallclock_now + waitTime.

Here's the deal. ICANN today signs a whole group of root DNSKEY RRSetsonce every 3 or 6 months (I forget what the interval is for when theymeet). I'm not aware of any real security placed on that data oncesigned, but I would venture to guess its not great since its just signeddata.

What I would do is look at when the current root KSK rollover processbegan and see what the actual expiration dates are for the old signedRRSets and then compare it to your calculations. I know my "Look at thelast expiration date of any signed RRSet with the old key in it" givesme a good fixed point in time to work from. I know that your "look atthe signature interval" doesn't without a lot of additional knowledge.

E.g. given the above is your statement in 1.2 unconditionally true?Under which circumstances would it not be true? Could your document beinterpreted in a way to make 1.2 not true?


   + That being said, I changed the intro text a bit to be a bit more
     clear about the fact it starts from the publication time.

     + Based on the conversation in Prague, I'll leave it as is but try
       to clarify a bit about the issue.


1.16 FIXED Section 5.1.1 - You're missing a *very* important point here -
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   that DNSKEY RRSets may be signed ahead of their use.  You need to
   assume that once signed, they are available to be published - even by
   an attacker.  So wherever you have "signature lifetime" you want
   something like "latest signature expiration of any DNSKEY RRSet not
   containing the new key" or at least you want to calculate the
   date/time value based on that.

   + Result: There are two issues here:
     1. When we discuss the exact requirements for publication, we should
        be very very clear about the timing requirements.  I agree.
     2. We're trying to pass on the concept of the attack in this
        section, not necessarily a description that exactly covers all
        possibilities.  So, though I'm all for making it as accurate as
        we can, I don't think we should make the example text so
        confusing to cover all the corner cases that no one can follow
        it.
     3. It doesn't benefit an attacker to publish the signatures ahead of
        time. So you're right that anyone can publish new signatures, it
        really doesn't affect the timing required by the publisher to
        wait, which is the point of this draft.

The attacker doesn't publish the signatures - the publisher hassignatures it won't be using.... the publisher signs stuff way inadvance of publication because getting people together and getting theHSM unlocked to sign things is a big huge expensive production. If thepublisher doesn't think to modify its signing schedule in advance of a5011 action, then your interval calculations are less then useless.

     4. The important take away I take from your text is that any delay
        between signing and publication will affect the length of time to
        wait, and I'm sure this is what you mean by needing to calculate
        via wall-clock (since everything should be based on
        sigexpiration).

   XXX: With this goal in mind, I've cleaned up the text a bit to make it
   a bit more clear.


1.17 NOTHINGTODO Section 5.1.1 doesn't actually apply if you use the 5011 
rollover
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   approach (section 6.3).  E.g. K_old (and any RRSets it signed) will be
   revoked the first time K_new is seen and K_standby is the signing key.
   At this point this reduces to a normal denial of service attack (where
   you prevent new data from being retrieved by the resolver).  You'd
   need a different section to do that analysis. [And thinking about it,
   why is there any practical difference between this attack and a normal
   denial of service attack in the first place?]

   + Result: As we've both agreed in the past, the attack described in
     our 5.1.1 section only applies when you sign exclusively with a key
     that is too new.  So, yes when you are using K_standby, the attack
     in question doesn't work.  We're only describing the case where
     there either isn't a K_standby, or when K_standby is also newer than
     our 'waitTime' time.

   + And, yes by preventing a new key from being accepted as a trust
     anchor, this is a denial of service attack.  Though one with
     potentially serious ramifications since it may require manual
     intervention on all the devices affected by it (unlikely a
     network-based DDoS attack, it doesn't stop when the attacker stops
     sending packets; this is long lived until the configuration is
     manually fixed).

Section 5.1.1 does not apply to preventing a resolver from seeing arevocation. The calculations are different. You could add a newsection describing the revocation attack, but I think all you need to dois note that at the beginning of 5.1.1 and point to section 6.2 as themitigation math.



1.18 FIXED Section 5.1.1, T+35:  "since the hold down time of 30 days + 1/2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   the signature validity... " - Two items: Wordsmithing: The hold down
   time is just the 30 days, not the plus 1/2...  which I *think* given
   the reference to 2.3 is actually the queryInterval. clarify please.
   And queryInterval is not actually 1/2 the signature validity - its the
   MIN (15 days, 10 days (sig life)/2 and 1 day(orig ttl)/2) or 1/2 a
   day.

   + Result: you're misreading the sentence thinking the holddown time
     includes the math with the + and onward, where the holddown time was
     only the 30 minutes.  I'll reword and though I was trying to avoid
     the much more complex math in the example, you're correct that it's
     1/2 the queryInterval which has more terms to calculate it exactly.

The new language works.



1.19 NOTHINGTODO Section 6 - the formulas are wrong.  I also  don't understand
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   where you got MAX(originalTTL/2,15 days) - there's no support for this
   in the text.

   You're misreading the commas.  One of the terms in the outer max
   clause is "MAX(original TTL of K_old DNSKEY RRSet) / 2".  This is
   slightly different than what is in the "1/2*OrigTTL" clause in 5011
   itself.  This is because if the publisher changes TTLs over the course
   of signing, you have to take the maximum value of any of them, not
   just the most recent.  (though, to be super-accurate you need to do
   some math which we might want to describe about when a given TTL is
   published vs when Knew is introduced).

   Anyway, in the end, the formula in ours draft directly derives from
   what is in yours.  We do take into account the possibility of multiple
   TTLs for a given signature set, which 5011 doesn't take into account
   (and to some extent, it's less important, but only further shows how
   much variance a resolver might have before accepting a new trust
   anchor).

   A clear piece of advice for an eventual BCP would be to not change
   TTLs at the same time you start any 5011 publication or revocation
   process.

Note that in 6.1 you have 5 terms, but in the fully expanded equation in6.1.6 you have 4. You're missing the safetyMargin which you didn'tactually define completely in section 6.1.5.



1.20 NOTHINGTODO The final formula should be:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   EarliestDateWhereAttackFails::= latest SignatureExpiration of any
   DNSKEY RRSet not containing the new key

   [ ... alternate math analysis deleted for brevity ... ]

   + Result: there are multiple ways to list out the math / timing behind
     this.  You do spell out an alternate way to lay out the math; I
     believe your purpose was not to offer a replacement but to ensure we
     agree upon the timing.  I do see that you're using wall-clock type
     math, which certainly works and should be equivalent to your example
     start tiem of 5/1/2017 00:00UTC + our waitTime.

   + I do see your point that anyone could publish the data once signed,
     not just the

Then please write this document to assume people are stupid in how theydo things (e.g. signing a lot of data that then doesn't get used...)



1.21 NOTHINGTODO In any event, the point needs to be made that this attack
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   - while real -
   is a "retail" attack that would be difficult to prosecute by a single
   attacker across a broad range of end-entities.  This goes to the
   general model that no publisher of DNS data knows each and every
   consumer of that data, nor can wait its publication on every consumer
   getting published data.  The data protocol for DNS is unidirectional
   and the update protocol in 5011 was designed with that in mind.

   + Result: We never state in the document that the attack was against
     every validator out there.  It affects only the validators that
     received replayed key sets and signatures.  We don't dive into the
     analysis of how difficult it is to achieve such a replay, either
     directly or by cache poisoning upstream resolvers, or....  That's
     outside the scope of the document.  Thus, I don't see any change to
     the text that is needed as I think the text is pretty clear that a
     replay attack is already necessary in the attack model.

OK. Section 5 first para should probably be more clear on this, but Ican live with it.


_______________________________________________
DNSOP mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/dnsop

Re: [DNSOP] Responding to MSJ review of the previous rfc5011-security-considerations

Reply via email to