Hi Paul,

That is correct and this is what we are doing with a 10% over each side
when time base SA life time is used. The number I gave was measured using
byte base SA life time.
We tried the same approach for the byte base SA lifetime but the same
strategy does not work -- unless we consider very high values up to 40% of
the SA life time.

In our case, the traffic is symmetric, that is the same amount of bytes is
carried into both directions, SA life times are randomized +/- 10% and each
gateway checks the SA used for decryption. Such randomization (or
difference between the SAs) is not sufficient as that the counter is
checked approximately every 2 s to determine if a rekey needs to be
performed and any traffic burst cancels differences introduced by the
randomization.
In our current configuration initiator and responder are really
interchangeable, and this mostly results from the fact that they look at
different SAs AND that the traffic is relatively symmetric.
The ability to agree which of the peers is handling the rekey is expected
to prevent such simultaneous rekey as the behavior becomes predictable
between two peers and between different implementations.
There is still a need to introduce some randomness to prevent multiple
rekeys to happen at the same time. Assigning a rekey responsibility
requires one peer to look at both (unidirectional SAs) as opposed to one,
but over the number of SA the necessary resource associated to this
responsibility will remain the same as the one considered.
What we need to make sure is that a rekey occurs even if the peer
designated to rekey does not fulfill its commitment.

If we extend the case to traffic that is asymmetric (with our
implementation) the asymmetry of the traffic competes with the difference
of the SA lifetime which results for a given SA in randomness increasing or
decreasing the probability of simultaneous rekey. So here also, we believe
that being able to specify a behavior will prevent or at least limit the
simultaneous rekey.

Yours,
Daniel

On Tue, Nov 30, 2021 at 2:42 PM Paul Wouters <paul.wout...@aiven.io> wrote:

>
> On Tue, Nov 30, 2021 at 8:21 AM Daniel Migault <mglt.i...@gmail.com>
> wrote:
>
>>
>> Thank you all for the comments. I believe there is a misunderstanding of
>> the resource issue we are facing, so please find below a more detailed
>> description.
>>
>> The resource in question is neither related to the CPU nor the memory,
>> nor the bandwidth as comments seem to suggest but instead related to the
>> number of table entries that perform the IPsec processing that varies
>> between 720 to 4000 depending on the hardware chip.
>>
>
> Okay.
>
>
> Randomizing the SA lifetime is a probabilistic approach we - and our
> customers - do not want to rely on even if the probability of collision
> decreases with the SA lifetime.
>
> So now I am a bit confused. If you are mostly concerned able table length,
> then 1 double IPsec SA won't hurt you, but 4000 will. So removing a random
> percentage amount of seconds from the initiator would actually
> probabilistic fix your issue. 4000 connections with a lifetime of 3600s,
> that all start at the same time with a 20% fuzz on initiator would cause
> you an extra 5 or 6 SAs. If your code recognises the new SA traffic, and
> there is frequent traffic, these double SAs would go away in seconds. So on
> a unit that can do 4000 SAs, you can now do only 3994 SAs.
>
> Am I making a mistake here ?
>
> Paul
>
>
>
>

-- 
Daniel Migault
Ericsson
_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to