Hi Paul,

thank you for the very thorough review (and especially - for the nits).

This is a partial review of draft-ietf-ipsecme-ddos-protection-06
up to Section 6. I hope to complete the rest in the next few days.

I think this document needs another revision before continuing.
(and I would prefer it to be split in two)

Issues / Questions:

   An obvious defense, which is described in Section 4.2, is limiting
   the number of half-open SAs opened by a single peer.  However, since
   all that is required is a single packet, an attacker can use multiple
   spoofed source IP addresses.

I am not sure why this is mentioned here in this way, because the attack
of spoofed source IP is already handled effectively with DOS cookies. I
think it is better to state "bot-nets are large enough that they have
enough unique IP addresses" and avoid talking about spoofing in this
section altogether.

Here are some general observations of IKEv2 vulnerabilities,
regardless of the existing and proposed defense mechanisms, which are described in subsequent sections.

   Stage #3 includes public key operations, typically more than one.

It seems this sentence needs to say something that these operations are
very expensive, similar to describing the "effort" in the previous
sentences of stage #1 and stage #2.

OK. How about:

   Stage #3 may include public key operations if certificates are involved.
These operations are often more computationly expensive than those performed at stage #2.

   It seems that the first thing cannot be dealt with at the IKE level.
   It's probably better left to Intrusion Prevention System (IPS)
   technology.

I would rewrite this more authoritively, and not use the word "seems"

OK. How about:

   If an attacker is so powerfull that it is able to overwhelm
   the Responder's CPU that deals with generating cookies,
   then the attack cannot be dealt with at the IKE level and
must be handled by means of the Intrusion Prevention System (IPS) technology.

   Depending on the Responder implementation, this can be repeated with
   the same half-open SA.

I don't think this "depends on the implemention". Since any on-path
attacker can spoof rubbish, a Responder MUST ignore the failed packet
and remain ready to accept the real one for a certain about of time.

"Depending on the Responder implementation" means here that if along with discarding the failed packet the Responder also discards the computed SK_* keys, then it will need to re-calculate them again
when the next IKE_AUTH packet is received, so the attack can be
repeated. The SK_* keys don't depend on IKE_AUTH messages,
so in general there is no need to discard them even if the received
IKE_AUTH packet failed to decrypt properly, and the draft advises to keep them in this case. However, implementations may have good reasons to do this (e.g. to free hardware resources if crypto is performed in HW).

And this also applies to this later section in the document:

   If the received IKE_AUTH message failed to decrypt correctly (or
   failed to pass ICV check), then the Responder SHOULD still keep the
   computed SK_* keys, so that if it happened to be an attack, then the
   malicious Initiator cannot get advantage of repeating the attack
   multiple times on a single IKE SA.

Please, see above.

Do you think more explanationa are needed here?

   Retransmission policies in practice wait at least one or two seconds
   before retransmitting for the first time.

I'm not sure if this is still true. Libreswan starts at 0.5s and doubles,
and I know that iOS was faster too.

Well, there are different implementations and each has its own
retransmission policy. The Responder should take into account
the slowest sensible retransmission policy, which seems to be the one described in the draft.

Will the following text make you happy?

   Many retransmission policies in practice wait one or two seconds
   before retransmitting for the first time.

   When not under attack, the half-open SA timeout SHOULD be set high
   enough that the Initiator will have enough time to send multiple
   retransmissions, minimizing the chance of transient network
   congestion causing IKE failure.

I agree, but I'd like to note that this and the text just above mentioning
"several minutes" is kind of archaic. We found a limit of 30 seconds on

That's what RFC 7296 recommends (Section 2.4).

other implementations so common as a timeout, that we see no more value in
keeping an IKE exchange around for more then 30 seconds. (we do re-start
and try a new exchange from scratch for longer, in some configurations we
try that forever)

   For IPv6, ISPs assign between a /48 and a /64, so it makes sense to use
   a 64-bit prefix as the basis for rate limiting in IPv6.

Why does that make sense over using /48 ? Wouldn't you rather rate limit
some innocent neighbours over not actually defending against the attack?
If puzzles work as advertised, real clients on that /48 should still be
able to connect.

Well, I'm not an IPv6 expert. Probably Michael Richardson (who suggested this change) or somebody else will comment on this.

   Regardless of the type of rate-limiting used, there is a huge
   advantage in blocking the DoS attack using rate-limiting for
   legitimate clients that are away from the attacking nodes.  In such
   cases, adverse impacts caused by the attack or by the measures used
   to counteract the attack can be avoided.

I don't understand this paragraph at all. I guess "rate-limiting for
legitimate clients" just confuses me. I think it might attempt to be
saying "not blocking ranges with no attackers helps real clients", but
it is very unclear.

Yoav?

   to calculate the PRF

One does not "calculate" a PRF. One uses a PRF to calculate something.

OK.

The section that starts with "Upon receiving this challenge," seems to
be discussing the pros and conns of this method before it has explained
the method. The reader is forced to skip this or forward to section 7
and getting back to this part. I suggest to re-order some text to avoid
this, or to give a better short summary of the puzzle nature just before
this paragraph.

It describes the puzzles mechanism in general, while Sections 7 & 8
describe the particular instantiation of puzzles in IKEv2.
I'd rather to keep some background about puzzles here,
so that all possible defenses are described in one place.

   When the Responder is under attack, it MAY choose to prefer
   previously authenticated peers who present a Session Resumption
   ticket (see [RFC5723] for details).

Why is this only a MAY? Why is it not a SHOULD or MUST?

A good question. I think the idea was not to force the Responder
to serve only resumed clients and to let him(her) prioterize
clients according to its own policy. In my opinion MUST is too strong, but SHOULD is probably OK.

   The Responder MAY require such
   Initiators to pass a return routability check by including the COOKIE
   notification in the IKE_SESSION_RESUME response message, as allowed
   by Section 4.3.2. of [RFC5723].

Perhaps this should say the responder SHOULD require COOKIEs for resumed
sessions if it also requires COOKIEs for IKE_INIT requests. That is, it
should not give preference to resumed sessions as those could be equally
forged as IKE_INIT requests.

A good point. I tend to agree. Yoav?

   With a typical setup and typical Child SA lifetimes, there
   are typically no more than a few such exchanges, often less.

(ignoring the language) I do not believe this is true. This goes back to
the discussion on how often people deploy liveness probes. Implementors
seem to think 30s, while endusers want and do configure things like 1s.
I don't think the text about the amount of IKE exchanges are typical
are needed because the text below talks about specific abuse anyway,
and not in terms of just number of exchanges.

Are you suggesting to remove it?

      If the peer creates too many Child SA with the same or overlapping
      Traffic Selectors, implementations can respond with the
      NO_ADDITIONAL_SAS notification.

I think this requires normative language, eg: implementations MUST respond
with a NO_ADDITIONAL_SAS notification. The same for the next bullet item
where it says "implementations can introduce an artificial delay", which
should be like: "MAY introduce an artificial delay" (or even SHOULD, or
rewrite "too many" to "many" and use MAY)

I'd use MAY and keep "too many". "Too many" means here that a peer is at least misbehaved, while just "many" doesn't imply this
(in my reading).

Section 5 switchs from talking about "the Responder" to "the implementation".
I think it should be "the Responder" throughout the document.

OK.

    the retransmitted messages should be silently discarded.

That should be normative too, MUST be discarded.

Agree.

I won't comment the nits.

Thank you,
Valery.

_______________________________________________
IPsec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to