Pekka Riikonen writes: > I'd like to return to this failover counter. It's the single issue in the > protocol which I don't like. The reasons are, first, that it makes an > assumption how clustering and sync has been implemented, that is this > value has to be synced in the cluster and the protocol cannot work without > it, and second, that the protocol itself becomes vulnerable to the very > issues it tries to combat, namely the fact that sync messages can be > dropped.
Depends how we define failover counter. If we define so it is monotonously increasing counter which can skip forward there are ways to do it without syncing the information. As most of the sync protocols already have either timestamps or some kind of message ids using those as failover counter would work. I.e. in the simplest case just use last failover time as failover counter. There is no reason for failover counter to increment by one for every failover, its value just needs to be bigger than the previous failover counter so we can detect whether this is old failover message or whether it is something we haven't seen before. > Consider for example three node cluster where node 1 fails. Node 2 > increments the failover counter but fails to sync it to node 3 (the packet > is dropped or node 2 fails immediately itself). Node 3 now hasn't the > updated failover counter and will create a request that is effectively a > replay. So the node 2 should first agree that it really is the one who is going to take care of the failing node 1 traffic, thus it needs to communiate with node 3 and agree on that (to prevent both node 2 and node 3 acting as failover hosts). During this communication it can also sync the failover counter. Also if node 3 just detects that node 2 crashed by snooping the traffic, then it can also snoop the IKEv2 message ID sync messages and get the failover counter from there (it do have the keying material to decrypt the IKEv2 messages if it is going to be acting as failover node). > The draft has the following text: > > In case multiple successive failover events and sync request getting > lost, the failover count value at peer will not be updated and new > standby member will become active with incremented failover count > value. So, peer can receive valid failover count value which is not > just incremented by 1 in case of multiple failover. Accepting > incremented failover count within a range is allowed and increases > interoperability. > > Which is vague, and I don't even completely understand what it tries to > tell here. In case of multiple failovers the failover counter might *not* > be incremented at all because the sync may have been dropped. I think that text tries to say that as the node 2 might have synced with node 3, but it might be that it failed before it actually managed to send out IKEV2_MESSAGE_ID_SYNC message, thus other end never knows that node 2 was active at all, and the first thing they see is the message from node 3, where the failover counter is incremented twice. I think the current text should be clear that failover counter can be incremented by any number, and message is accepted if failover counter is larger than any previous failover counter node has seen. This kind of text would allow using for example timestamp instead of failover counters, on those clusters where they do have good enough synced clocks. -- [email protected] _______________________________________________ IPsec mailing list [email protected] https://www.ietf.org/mailman/listinfo/ipsec
