Pekka Riikonen writes:
> I'd like to return to this failover counter.  It's the single issue in the 
> protocol which I don't like.  The reasons are, first, that it makes an 
> assumption how clustering and sync has been implemented, that is this 
> value has to be synced in the cluster and the protocol cannot work without 
> it, and second, that the protocol itself becomes vulnerable to the very 
> issues it tries to combat, namely the fact that sync messages can be 
> dropped.

Depends how we define failover counter. If we define so it is
monotonously increasing counter which can skip forward there are ways
to do it without syncing the information. As most of the sync
protocols already have either timestamps or some kind of message ids
using those as failover counter would work. I.e. in the simplest case
just use last failover time as failover counter.

There is no reason for failover counter to increment by one for every
failover, its value just needs to be bigger than the previous failover
counter so we can detect whether this is old failover message or
whether it is something we haven't seen before.

> Consider for example three node cluster where node 1 fails.  Node 2 
> increments the failover counter but fails to sync it to node 3 (the packet 
> is dropped or node 2 fails immediately itself).  Node 3 now hasn't the 
> updated failover counter and will create a request that is effectively a 
> replay.

So the node 2 should first agree that it really is the one who is
going to take care of the failing node 1 traffic, thus it needs to
communiate with node 3 and agree on that (to prevent both node 2 and
node 3 acting as failover hosts). During this communication it can
also sync the failover counter.

Also if node 3 just detects that node 2 crashed by snooping the
traffic, then it can also snoop the IKEv2 message ID sync messages and
get the failover counter from there (it do have the keying material to
decrypt the IKEv2 messages if it is going to be acting as failover
node). 

> The draft has the following text:
> 
>    In case multiple successive failover events and sync request getting
>    lost, the failover count value at peer will not be updated and new
>    standby member will become active with incremented failover count
>    value.  So, peer can receive valid failover count value which is not
>    just incremented by 1 in case of multiple failover.  Accepting
>    incremented failover count within a range is allowed and increases
>    interoperability.
> 
> Which is vague, and I don't even completely understand what it tries to 
> tell here.  In case of multiple failovers the failover counter might *not* 
> be incremented at all because the sync may have been dropped.

I think that text tries to say that as the node 2 might have synced
with node 3, but it might be that it failed before it actually managed
to send out IKEV2_MESSAGE_ID_SYNC message, thus other end never knows
that node 2 was active at all, and the first thing they see is the
message from node 3, where the failover counter is incremented twice.

I think the current text should be clear that failover counter can be
incremented by any number, and message is accepted if failover counter
is larger than any previous failover counter node has seen.

This kind of text would allow using for example timestamp instead of
failover counters, on those clusters where they do have good enough
synced clocks. 
-- 
[email protected]
_______________________________________________
IPsec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to