Re: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06

Valery Smyslov Fri, 03 Jun 2016 06:24:32 -0700

Hi Paul,

    An obvious defense, which is described in Section 4.2, is limiting
    the number of half-open SAs opened by a single peer.  However, since
    all that is required is a single packet, an attacker can use multiple
    spoofed source IP addresses.


 I am not sure why this is mentioned here in this way, because the attack
 of spoofed source IP is already handled effectively with DOS cookies. I
 think it is better to state "bot-nets are large enough that they have
 enough unique IP addresses" and avoid talking about spoofing in this
 section altogether.


Here are some general observations of IKEv2 vulnerabilities,

regardless of the existing and proposed defense mechanisms, which aredescribed in subsequent sections.


But it is incomplete and out of place. Section is is about The
Vulnerability. It talks about vulnerabilities, then this one solution to
one thing, then goes into detail about the work that makes it
vulnerable. That is why I suggest to just remove the paragraph.


Ok, I see your point.

    Stage #3 includes public key operations, typically more than one.

 It seems this sentence needs to say something that these operations are
 very expensive, similar to describing the "effort" in the previous
 sentences of stage #1 and stage #2.


OK. How about:

   Stage #3 may include public key operations if certificates are involved.
   These operations are often more computationly expensive than those
   performed at stage #2.


Looks good.

    It seems that the first thing cannot be dealt with at the IKE level.
    It's probably better left to Intrusion Prevention System (IPS)
    technology.

 I would rewrite this more authoritively, and not use the word "seems"


OK. How about:

   If an attacker is so powerfull that it is able to overwhelm
   the Responder's CPU that deals with generating cookies,
   then the attack cannot be dealt with at the IKE level and
   must be handled by means of the Intrusion Prevention System (IPS)
   technology.


Looks good.

    Depending on the Responder implementation, this can be repeated with
    the same half-open SA.

 I don't think this "depends on the implemention". Since any on-path
 attacker can spoof rubbish, a Responder MUST ignore the failed packet
and remain ready to accept the real one for a certain about of time.
"Depending on the Responder implementation" means here that if along withdiscarding the failed packet the Responder also discards the computed SK_*keys, then it will need to re-calculate them again
when the next IKE_AUTH packet is received, so the attack can be
repeated. The SK_* keys don't depend on IKE_AUTH messages,
so in general there is no need to discard them even if the received
IKE_AUTH packet failed to decrypt properly, and the draft advises to keepthem in this case. However, implementations may have good reasons to do this(e.g. to free hardware resources if crypto is performed in HW).


Oh, I didnt realise you talked about re-using DH components. Ok, in that
case it makes sense but you might want to say it only applies to those
who re-use DH calculations between different IKE peers. Our software
never does that (and I think FIPS also puts additional constraints on
this)

No, it is not about re-using DH private key with different peers.I probably poorly explained. Let me try again.


Once the IKE_SA_INIT is complete the responder has all needed data
to calculate SKEYSEED and SK_* keys. However, it is a CPU consuming
operations, so the responder may want to postpone them until the keys are
really needed, i.e. until it receives the IKE_AUTH request from the initiator.

This behaviour allows responder not to waste resources in caseIKE_SA_INIT was from an attacker and IKE_AUTH request never comes.Once IKE_AUTH request arrives the responder performs DH, calculates SKEYSEEDand SK_* keys that allows him to decrypt and verify this request. In case it failsto decrypt IKE_AUTH request, the responder has two possibilities -keep just calculated SK_* keys until the next (hopely proper) IKE_AUTH

request is received or discard them (e.g. to save crypto resources) and
recalculate them again once the next IKE_AUTH request is received (note
that re-calculating will result in EXACTLY the same keys, since they don't

depent on any data from IKE_AUTH). The draft recommends to keep thekeys until the proper IKE_AUTH request is received (or until the exchangetimed out). This advise may look obvious, but I think is still worth to mention.


I recall we've already discussed this while reviewing the -05 version...

Please, see above.

Do you think more explanationa are needed here?


No I guess it is fine.


Are you sure after the above explanation?

    Retransmission policies in practice wait at least one or two seconds
    before retransmitting for the first time.

 I'm not sure if this is still true. Libreswan starts at 0.5s and doubles,
 and I know that iOS was faster too.


Well, there are different implementations and each has its own
retransmission policy. The Responder should take into account

the slowest sensible retransmission policy, which seems to be the onedescribed in the draft.


Will the following text make you happy?

   Many retransmission policies in practice wait one or two seconds
   before retransmitting for the first time.


It would be nicer to rewrite it without mentioning any absolute times.
That way the text will also remain more relevant in the future if/when
these timings change.


I don't think it is a good idea. The draft should give implementers some
estimate timings. "One or two seconds" is here a "worst case". If Implementers
take this data into consideration when selecting the short timeout,
they'll always be on the safe side, because if some implementations retransmit
more aggressively, then they'll always fit within this time period.

So I'd rather keep the text as above.

    When not under attack, the half-open SA timeout SHOULD be set high
    enough that the Initiator will have enough time to send multiple
    retransmissions, minimizing the chance of transient network
    congestion causing IKE failure.

 I agree, but I'd like to note that this and the text just above mentioning
 "several minutes" is kind of archaic. We found a limit of 30 seconds on


That's what RFC 7296 recommends (Section 2.4).


Okay, fair enough. I guess you mention shortening it while under attack,
so it's all okay.

 other implementations so common as a timeout, that we see no more value in
 keeping an IKE exchange around for more then 30 seconds. (we do re-start
 and try a new exchange from scratch for longer, in some configurations we
 try that forever)

    For IPv6, ISPs assign between a /48 and a /64, so it makes sense to use
    a 64-bit prefix as the basis for rate limiting in IPv6.

 Why does that make sense over using /48 ? Wouldn't you rather rate limit
 some innocent neighbours over not actually defending against the attack?
 If puzzles work as advertised, real clients on that /48 should still be
 able to connect.

Well, I'm not an IPv6 expert. Probably Michael Richardson (who suggested thischange) or somebody else will comment on this.


This does not so much relate to IPv6 but to whether you rather
overestimate or underestimate the attacker's IP space. If you
underestimate, you will take longer to punish the attacking IPs. If you
overestimate you will needlessly slow down legitimate clients.

I don't know which of the two is better, hence my objection to "it makes
sense" because I don't see that.

What's your suggestion for this text? Just remove "it make sense" orcompletely rewrite the para? If the latter, please provide the text.

    Regardless of the type of rate-limiting used, there is a huge
    advantage in blocking the DoS attack using rate-limiting for
    legitimate clients that are away from the attacking nodes.  In such
    cases, adverse impacts caused by the attack or by the measures used
    to counteract the attack can be avoided.

 I don't understand this paragraph at all. I guess "rate-limiting for
 legitimate clients" just confuses me. I think it might attempt to be
 saying "not blocking ranges with no attackers helps real clients", but
 it is very unclear.


Yoav?

    to calculate the PRF

 One does not "calculate" a PRF. One uses a PRF to calculate something.

OK.


You didn't provide text but I assume you changed it somehow.


s/PRF/"output of PRF" or s/PRF/"the result of PRF"   Is it OK?

 The section that starts with "Upon receiving this challenge," seems to
 be discussing the pros and conns of this method before it has explained
 the method. The reader is forced to skip this or forward to section 7
 and getting back to this part. I suggest to re-order some text to avoid
 this, or to give a better short summary of the puzzle nature just before
 this paragraph.


It describes the puzzles mechanism in general, while Sections 7 & 8
describe the particular instantiation of puzzles in IKEv2.
I'd rather to keep some background about puzzles here,
so that all possible defenses are described in one place.


Then I think it still requires a one-line introduction to puzzles.

I'm a bit confused. I've been thinking that the whole Section 4.4is a high-level description of the puzzles. Where do you want to insert

the one-line introduction?

    When the Responder is under attack, it MAY choose to prefer
    previously authenticated peers who present a Session Resumption
    ticket (see [RFC5723] for details).

 Why is this only a MAY? Why is it not a SHOULD or MUST?


A good question. I think the idea was not to force the Responder
to serve only resumed clients and to let him(her) prioterize

clients according to its own policy. In my opinion MUST is too strong, butSHOULD is probably OK.


In the famous words of Steve Kent, if you say SHOULD instead of MUST,
explain when the Responder should not.


When it has good reasons :-)

Seriously, consider the situation when the responder finds itself
under attack and switches to only respond to IKE_SA_RESUME
requests. In this case it will leave legitimate clients without

resumption tickets (e.g. ticket expired) out of scope.

I think there is no reasom to put MUST here, since in any case
it is a local policy which dictates the responder's behaviour,

and ther are no interoperability issues whether is is MAY,SHOULD or MUST, it is just the responder's local policy matter.

So SHOULD is just good advise.

    The Responder MAY require such
    Initiators to pass a return routability check by including the COOKIE
    notification in the IKE_SESSION_RESUME response message, as allowed
    by Section 4.3.2. of [RFC5723].

 Perhaps this should say the responder SHOULD require COOKIEs for resumed
 sessions if it also requires COOKIEs for IKE_INIT requests. That is, it
 should not give preference to resumed sessions as those could be equally
 forged as IKE_INIT requests.


A good point. I tend to agree. Yoav?

    With a typical setup and typical Child SA lifetimes, there
    are typically no more than a few such exchanges, often less.

 (ignoring the language) I do not believe this is true. This goes back to
 the discussion on how often people deploy liveness probes. Implementors
 seem to think 30s, while endusers want and do configure things like 1s.
 I don't think the text about the amount of IKE exchanges are typical
 are needed because the text below talks about specific abuse anyway,
 and not in terms of just number of exchanges.


Are you suggesting to remove it?


Yes. You can just talk about something like "If an abusive amount of
(otherwise) valid IKE messages are received, ....." and let the

implemetor decide how many IKE messages counts as abusive?


OK, I see your point.

That also
avoids what to do when rekey's happen because that would likely reset
the counter because it is a new state?


Well, I think the proper approach is to measure the rate of such

exchanges (per SA or course). So, just reset the counter everysecond and measure how many exchanges happened within

the second. If the number looks abusive, take measures.

       If the peer creates too many Child SA with the same or overlapping
       Traffic Selectors, implementations can respond with the
       NO_ADDITIONAL_SAS notification.

 I think this requires normative language, eg: implementations MUST respond
 with a NO_ADDITIONAL_SAS notification. The same for the next bullet item
 where it says "implementations can introduce an artificial delay", which
 should be like: "MAY introduce an artificial delay" (or even SHOULD, or
 rewrite "too many" to "many" and use MAY)

I'd use MAY and keep "too many". "Too many" means here that a peer is atleast misbehaved, while just "many" doesn't imply this

(in my reading).


You cannot say "too many" and "MAY". If it is too many, it is abusive.
So you MUST take action. On the other hand if you say "many", then you
leave it open to interpretation whether it is abuse or not, and you can
use "MAY".


I see. Language differences :-) Ok, let's remove "too".

 Section 5 switchs from talking about "the Responder" to "the
 implementation".
 I think it should be "the Responder" throughout the document.

OK.

     the retransmitted messages should be silently discarded.

 That should be normative too, MUST be discarded.


Agree.


Paul


Thank you,
Valery.

_______________________________________________
IPsec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ipsec

Re: [IPsec] Review of draft-ietf-ipsecme-ddos-protection-06

Reply via email to