Pekka Riikonen writes:
> : Now the real question is what are we going to do with the exchanges
> : which are still in progress when the sync message is received, and how
> : do we recover when we notice that we do have missed IKE messages,
> : meaning we have created, rekeyed or deleted some Child SA in those
> : messages we lost because of failover. 
> :
> Draft specifies what to do here as well, at least partly.  Also note that 
> you may never notice missed IKE messages because they happened in the 
> failed cluster node.  Normal crash recovery takes care of any desync 
> problems that may have happened.

Define "normal crash recover". We did get rid of most of the crash
recovery stuff in the IKEv2 compared to IKEv1. In RFC5996 we have text
saying:

        "If connection state becomes sufficiently messed up, a node
        MAY close the IKE SA, as described above. It can then rebuild
        the SAs it needs on a clean base under a new IKE SA."

In section 1.5 we also have text saying:

   o  If an ESP or AH packet arrives with an unrecognized SPI.  This
      might be due to the receiving node having recently crashed and
      lost state, or because of some other system malfunction or attack.
...
   In the first case, if the receiving node has an active IKE SA to the
   IP address from whence the packet came, it MAY send an INVALID_SPI
   notification of the wayward packet over that IKE SA in an
   INFORMATIONAL exchange.  The Notification Data contains the SPI of
   the invalid packet.  The recipient of this notification cannot tell
   whether the SPI is for AH or ESP, but this is not important because
   the SPIs are supposed to be different for the two.  

But there is no real description what to do when we receive
INVALID_SPI message (most likely the Child SA will be deleted by
sending normal delete notification, but as the other end will not know
anything about the Child SA it will not reply this message and this
will result in half-closed Child SA, where the section 1.4.1 says:

   Half-closed ESP or AH connections are anomalous, and a node with
   auditing capability should probably audit their existence if they
   persist.  Note that this specification does not specify time periods,
   so it is up to individual endpoints to decide how long to wait.  A
   node MAY refuse to accept incoming data on half-closed connections
   but MUST NOT unilaterally close them and reuse the SPIs.

meaning even when host which did not crash sends delete for the Child
SA which is not there anymore the crashed host will not reply and the
Child SA stays in half-closed state.

The RFC 5996 solution to this is to delete the IKE SA and start over.
If we want to do something different here we need define the behavior
here. 

> ...
>    o  The peer should not wait for any pending responses while
>       responding with the new Message ID values.  For example, if the
>       window size is 5 and the peer's window is 3-7, and if the peer has
>       sent requests 3, 4, 5, 6, 7 and received responses only for 4, 5,
>       6, 7 but not for 3, then it should include the value 8 in its
>       EXPECTED_SEND_REQ_MESSAGE_ID payload and should not wait for a
>       response to message 3 anymore.

This opens new attacks, because now receiving sync message with
message ID zero has also other effects than just syncing the max
message ID for future use, it also causes existing exchanges to be
destroyed.

I.e. if host A and B have done sync earlier, meaning attacker has a
copy of IKE SA Message ID sync message having message ID of zero, then
attacker can wait for host B to start few exchanges and then reply the
old sync message. If host A now immediately while processing the
request destroys old existing exchanges then this allows attackers to
delete exchanges at will.

Nonce in the Message ID sync message will not help in this case, but
the failover counter will help, as in that case the host A will reject
the replay because of the reused failover counter. 

What should host A do if it DOES receive replies for those missing
messages? 

>    o  Similarly, the peer should also not wait for pending (incoming)
>       requests.  For example if the window size is 5 and the peer's
>       window is 3-7 and if the peer has received requests 4, 5, 6, 7 but
>       not 3, then it should send the value 8 in the
>       EXPECTED_RECV_REQ_MESSAGE_ID payload, and should not expect to
>       receive message 3 anymore.
> ...

Again what should the peer do if it does get those messages? Ignore
them? Process them? If they are sent before the crash and were delayed
in the network, and arrived after the sync message, they were most
likely from the previous incarnation of the other end, thus most
likely they need to be ignored.

> Are the Security Considerations in the draft valid anymore at all?  And 
> are the nonce and failover count needed?  Yaron wanted to eliminate these, 
> and I'm all for it.

I think we still need both. The failover counter protects against the
attack I explained in this email (i.e. attacker replaying request),
and nonce protects against the attacks where attacker tries to replay
response message. 
-- 
[email protected]
_______________________________________________
IPsec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to