[IPsec] Re: [[email protected]: New Version Notification for draft-antony-ipsecme-ikev2-fragment-acknowledgment-01.txt]

Valery Smyslov Wed, 18 Mar 2026 16:32:39 -0700

Hi Graham,

> Hi Antony
> 
> I haven't read your draft, but will do and will comment if it helps.
> 
> FYI I pinged the text below to Valery over the weekend after reading his draft
> (which I liked).
> 
> Since sending the text below, I had a thought about a super long bitmask
> potentially becoming fragmented itself.. however i'm not sure how likely that
> would be.


If the bitmap becomes too long, you can send several receipt status messages,
each containing only a part of the whole bitmap (this is what the First 
Fragment Num 
and the Last Fragment Num fields for). But I didn't do experiments with this 
scenario.

Regards,
Valery.

> cheers
> 
> ////
> 
> 4.2.1.5.  Implementation Details
> 
> 
> 
> When a sender uses the techniques described in Sections 4.1.1 (randomized
> fragment ordering) and 4.1.2 (inter-fragment delays), a receiver cannot
> immediately distinguish between fragments that have been lost in transit and
> fragments that are still enroute due to deliberate pacing by the sender or
> reordering in the network. A receiver that sends a Receipt Status Message
> (Section 4.2.1.4) prematurely, e.g., prior to all fragments having had
> reasonable time to arrive, will request retransmission of fragments that are
> not in fact lost. This can result in unnecessary duplicate traffic, wasted
> bandwidth, and in severe cases a feedback loop of spurious retransmissions
> that worsens the congestion that the unilateral techniques were designed to
> alleviate.
> 
> 
> 
> To mitigate this, implementations SHOULD adopt a fragment collection
> strategy that accounts for the expected arrival pattern of fragments.
> This document defines three approaches, in order of increasing complexity. An
> implementation MAY support more than one and allow the operator to select
> the appropriate mode based on deployment characteristics.
> 
> 
> 
> 4.2.1.5.1.  Strict Mode
> 
> 
> 
> In Strict Mode, the receiver MUST NOT send a Receipt Status Message until a
> configurable hold-down timer has expired after receipt of the first fragment 
> of
> a message. The hold-down timer value SHOULD be set by the operator based
> on knowledge of the network characteristics between the peers.
> 
> 
> 
> The following guidance is provided for selecting hold-down timer values:
> 
> 
> 
> - For low-latency, high-reliability networks (e.g., data centre interconnects,
> enterprise LAN): a hold-down timer of 50-200 milliseconds is RECOMMENDED.
> On such networks, fragments that have not arrived within this window are
> almost certainly lost.
> 
> - For typical Internet paths with moderate latency: a hold-down timer of 500
> milliseconds to 2 seconds is RECOMMENDED.
> 
> - For high-latency or bandwidth-constrained links (e.g., satellite
> communications, congested mobile networks): a hold-down timer of 3-10
> seconds or more may be necessary. On such links, propagation delay alone can
> be several hundred milliseconds, and the sender may be deliberately pacing
> fragments over an extended period. Operators SHOULD set the hold-down
> timer to at least twice the expected one-way propagation delay of the link.
> 
> 
> 
> If no hold-down timer is configured, the implementation MUST use a default
> value of no less than 1 second.
> 
> 
> 
> 4.2.1.5.2.  Relaxed Mode
> 
> 
> 
> In Relaxed Mode, the receiver tracks the arrival times of incoming fragments
> and MUST NOT send a Receipt Status Message while fragments are still
> arriving at a steady rate. The receiver SHOULD send a Receipt Status Message
> only after a quiescence period during which no new fragments have been
> received.
> 
> The quiescence period SHOULD be set to at least twice the observed mean
> inter-arrival time of fragments received so far in the current exchange. This
> allows the receiver to adapt to the sender's actual pacing behaviour without
> prior configuration.
> 
> 
> 
> Relaxed Mode is suitable for deployments where the network characteristics
> are unknown or variable, as it requires no operator configuration. However, it
> may be slower to react to genuine loss than Strict Mode with a well-tuned
> timer.
> 
> 
> 
> 4.2.1.5.3.  Adaptive Mode
> 
> 
> 
> In Adaptive Mode, the receiver combines both approaches. It uses a
> configurable minimum hold-down timer (as in Section 4.2.1.5.1) and
> additionally applies quiescence detection (as in Section 4.2.1.5.2).
> The receiver MUST NOT send a Receipt Status Message until both conditions
> are met: the hold-down timer has expired AND no new fragments have arrived
> for the quiescence period.
> 
> This mode is RECOMMENDED for general-purpose implementations as it
> provides a safety floor via the timer while adapting to actual network
> conditions via quiescence detection.
> 
> 
> 
> 4.2.1.5.4.  Interaction with IKEv2 Retransmission Timers
> 
> 
> 
> Implementations that use short initial retransmission timers with exponential
> back-off (as is common in deployed IKEv2 implementations) MUST ensure that
> the fragment collection hold-down period is considered when calculating
> retransmission timeouts. If the sender's retransmission timer fires before the
> receiver has had time to collect all fragments and respond with a Receipt
> Status Message, the sender will retransmit the entire message (or the first
> fragment per Section 4.1.3), defeating the purpose of selective 
> retransmission.
> 
> Specifically, when a sender is transmitting a large and fragmented message and
> is aware that selective retransmission may be in use, the sender's
> retransmission timer for that exchange
> 
> 
> 
> SHOULD be set to a value no less than the time required to transmit all
> fragments (including any inter-fragment delays) plus a reasonable allowance
> for the receiver to process the fragments and return a Receipt Status Message.
> 
> 
> 
> 4.2.1.5.5.  Considerations for Bandwidth-Constrained and High-Latency
> Networks
> 
> 
> 
> On satellite communication links and other high-latency, low-bandwidth
> networks, the interaction between the techniques described in this document
> requires particular care. These networks exhibit the combination of high
> propagation delay (often 250ms or more one-way for geostationary links),
> limited bandwidth that makes congestion from spurious retransmissions
> particularly costly, and higher baseline packet loss rates that make selective
> retransmission most valuable.
> 
> 
> 
> This creates a tension: the receiver benefits most from selective
> retransmission (because fragments are more likely to be genuinely lost), but
> must also wait longest before requesting it (because fragments take longest to
> arrive). Implementations deployed in these environments SHOULD use
> Adaptive Mode with a hold-down timer of at least one full round-trip time of
> the link and SHOULD err on the side of caution when in doubt.
> 
> 
> 
> 4.2.1.5.6.  Fragment Count as a Receiver Heuristic
> 
> 
> 
> The Total Fragments field in the Encrypted Fragment payload (Section
> 2.5 of [RFC7383]) is available to the receiver from the moment the first
> fragment arrives. This value provides a useful implicit signal that the 
> receiver
> MAY use to adjust its fragment collection behaviour without requiring any
> protocol extension or negotiation.
> 
> A message fragmented into a small number of fragments (e.g., fewer than 20)
> is likely to be fully transmitted by the sender within a short time window, 
> even
> with inter-fragment delays. A message fragmented into a large number of
> fragments (e.g., 100 or more) will take substantially longer to transmit,
> particularly when the sender is using the rate-limiting technique of Section
> 4.1.2. The receiver can use the Total Fragments value to scale its hold-down
> timer or quiescence period accordingly.
> 
> 
> 
> The following approach is RECOMMENDED. Implementations SHOULD allow
> the operator to configure a per-fragment delay estimate (in
> milliseconds) representing the expected inter-fragment spacing used by the
> sender. The receiver then calculates an adjusted hold-down timer
> as:
> 
> 
> 
> adjusted_holddown = base_holddown + (Total_Fragments *
> per_fragment_delay)
> 
> 
> 
> where base_holddown is the hold-down timer value as described above.
> This ensures that the receivers time-out window scales linearly with the size 
> of
> the message being received.
> 
> Senders can influence receiver behaviour through their choice of fragment
> size, which determines the Total Fragments count.
> 
> 
> 
> A sender on a low-latency, high-bandwidth link MAY choose a smaller
> fragment size (producing more fragments) if it determines that the receiver or
> intermediate network can handle the higher packet rate.
> Conversely, a sender on a high-latency or bandwidth-constrained link (e.g.,
> satellite communication) SHOULD use a larger fragment size where possible to
> reduce the total number of fragments, thereby reducing both the transmission
> time and the window during which the receiver must wait before concluding
> that fragments are missing.
> 
> 
> 
> The following guidance is provided for a sender fragment size selection based
> on network characteristics:
> 
> 
> 
> - On low-latency, high-bandwidth networks: the sender MAY use the minimum
> fragment size (i.e., the path MTU minus IKEv2 overhead), as the receiver can
> absorb a high packet rate and the resulting large fragment count should not
> cause excessive delay before selective retransmission can engage.
> 
> - On moderate-latency Internet paths: the sender SHOULD use the path MTU
> as the fragment size, which is the default behaviour defined in [RFC7383].
> 
> - On high-latency or bandwidth-constrained links: the sender SHOULD avoid
> producing an unnecessarily large number of fragments. Where the path MTU
> permits, a fragment size larger than the minimum SHOULD be used. The trade-
> off is that larger fragments are more costly to retransmit individually if 
> lost, but
> the reduced fragment count allows the receiver to engage selective
> retransmission sooner with greater confidence that gaps represent genuine
> loss.
> 
> 
> 
> Implementations SHOULD allow the operator to configure the fragment size or
> to select a network profile (e.g., "low-latency", "internet",
> "satellite") that sets appropriate defaults for both the fragment size and the
> receiver's hold-down parameters.
> 
> 
> 
> 4.2.1.5.7.  Duplicate Fragment Handling
> 
> 
> 
> Regardless of the mode in use, a receiver that has already successfully
> processed a fragment and subsequently receives a duplicate, (whether from a
> spurious retransmission or some form of network duplication) MUST silently
> discard the duplicate.
> Implementations MUST NOT treat receipt of a duplicate fragment as an error
> condition.
> 
> 
> 
> On Wed, Mar 18, 2026 at 3:03 PM Antony Antony <[email protected]>
> wrote:
> >
> > Hi Valery,
> >
> > Thanks for taking the time to present your draft at tomorrow's session.
> > I quickly went through your slides — appreciate you including a
> > comparison with our draft. I am sorry for my delayed response!
> > I updated our draft back in January, and didn't get around to responding it.
> >
> > Thnaks for the numbers in your slides. That give better picture.
> >
> > On Thu, Dec 11, 2025 at 03:19:18PM +0300, Valery Smyslov wrote:
> > > Hi Antony,
> > >
> > > please, see inline.
> > >
> > > > Hi Valery,
> > > > Thank you for the detailed feedback.
> > > >
> > > > I have been looking through the simultaneous-initiation case you
> > > > describe, where both peers have just completed an IKE SA rekey and
> > > > therefore begin with Message ID 0 on each side.  One situation can
> > > > be slightly problematic when there delayed responses, however, I
> > > > don't see any case where the proposed ack would fail to advance the
> negotation.
> > > >
> > > > Still to make it clear at the end  I am proposing two direction specific
> Notifiers instead of one.
> > >
> > > This would help. However, it won't work if some future (imaginary)
> > > IKE extension makes each exchange to use different key (e.g., as
> KDF(SK_ex, MSG-ID).
> >
> > Once this draft is standardized, any such future (imaginary) extension
> > would need to accommodate the existing mechanism regardless.  More
> > importantly, your proposal has the same property: the Receipt Status
> > Message is sent with the same Message ID as the original exchange, so
> > you also have two messages sharing a Message ID — the receipt status
> > and the actual IKE response. The concern applies equally to both drafts.
> >
> > As for the Message ID as AEAD counter: yes, implementations need to
> > handle this carefully, but less of a protocol correctness issue.
> > Implementations can track the context create a monotonus counter as IV.
> >
> > >
> > > > Here is How I see the case you described. I am using
> > > > CREATE_CHILD_SA as example. The analysis would similar for other
> > > > excahnge too.
> > > >
> > > > 1. Simultaneous CREATE_CHILD_SA requests after rekey In the
> > > > simplest case:
> > > >
> > > > ---- IKE SA Rekeyed both ends Message ID 0 Request
> > > > Initiator                               Responder
> > > >
> > > > MID(0) CREATE_CHILD_SA ---->           <------ MID(0) CREATE_CHILD_SA
> > > > FACK(MID=0, respose flag=1) --->       <------ FACK(MID=0, respose 
> > > > flag=1)
> > > >
> > > > Since each peer knows it has an outstanding request with MID=0,
> > > > the received FACK(MID=0,R=1) can be unambiguously associated with
> > > > its own outstanding request.
> > >
> > > Yes.
> > >
> > > > 2. Case where one peer has advanced its CREATE_CHILD_SA exchange
> > > > and the response is lost
> > > >
> > > > A more interesting scenario is when both peers send the
> > > > CREATE_CHILD_SA request, but one peer sends its response and then
> > > > advances its internal state, while the response is lost:
> > > >
> > > > The actual CREATE_CHILD_SA response fragments are lost.  And the
> > > > initiator responsd with FACK(MID=0, respose flag)
> > > >
> > > > MID(0) CREATE_CHILD_SA ---->         <------ MID(0) CREATE_CHILD_SA
> > > > FACK(MID=0, respose flag=1) --->     <------ Partial Retransmit (MID=0)
> > > >
> > > > MID(0) CREATE_CHILD_SA respose flag=1 ---->
> > > >
> > > >                                  <------MID(0) CREATE_CHILD_SA respose 
> > > > flag=1 ---->
> > > >                                  <------ FACK(MID=0, respose
> > > > flag=1)
> > > >
> > > > Here, once the responders have advanced past CREATE_CHILD_SA, any
> > > > FACK it receives later clearly corresponds to the response it sent.
> > > > The initiator can correctly attribute that FACK to the outstanding
> > > > response it is waiting for.
> > >
> > > I meant the case:
> > >
> > > MID(0) CREATE_CHILD_SA ---->         <------ MID(0) CREATE_CHILD_SA
> > > FACK(MID=0, response flag=1) --->     (1) (delayed)
> > >
> > >                                                               <---- 
> > > MID(0) CREATE_CHILD_SA response
> flag=1
> > > FACK(MID=0, response flag=1) --->     (2)
> > >                                                               (1
> > > received)
> > >
> > > Message (1) is the FACK response to the responder's request while
> > > message (2) is the FACK response to responder's response to initiator's
> request.
> > > The responder cannot distinguish these two messages.
> > > I agree that making the content different would help (but see
> > > above), but in general this is a headache to implement (since it
> > > violates the steps the incoming message is processed - it is
> > > processed in a context of a particular exchange that is determined before
> the message is parsed).
> > >
> > > > 3. Delayed or misordered FACK messages I agree there are corner
> > > > cases where a delayed FACK may arrive late and overlap with
> > > > another exchange with same MID, 0 in this case..
> > > > However, in these cases processing the FACK as a hint rather than
> > > > a state-advancing message does not break protocol correctness.
> > > > At worst, a late FACK would simply cause an extra re-transmit of
> > > > fragments that already arrived.
> > > >
> > > > Addressing your core concern: distinguishing request-side vs
> > > > response-side acknowledgments
> > > >
> > > > To address the case where a FACK for a request and a FACK for a
> > > > response may look identical (same MID, same exchange type, same R
> > > > flag), I agree this could lead to an un necessary ambiguity in
> > > > simultaneous-initiation scenarios.
> > > >
> > > > To resolve this cleanly, I propose defining two separate Notify
> > > > Status
> > > > Types:
> > > >
> > > > FRAGMENT_ACK_REQ — acknowledgment of fragments belonging to a
> > > > request
> > > >
> > > > FRAGMENT_ACK_RES — acknowledgment of fragments belonging to a
> > > > response
> > > >
> > > > These two notifiers would make the semantic direction explicit,
> > > > eliminating any ambiguity you describe even in simultaneous
> > > > exchanges with identical Message IDs.
> > >
> > > > more responses bellow inline.
> > > >
> > > > On Wed, Nov 26, 2025 at 03:24:15PM +0300, Valery Smyslov wrote:
> > > > > HI Antony,
> > > > >
> > > > > I doubt that this proposal is workable, at least in some situations.
> > > > > Consider the IKE SA was just rekeyed, so that each peer starts
> > > > > its first exchange with Message ID = 0. And consider they
> > > > > simultaneously initiate same exchange, say CREATE_CHILD_SA. And
> > > > > consider the response messages need fragmentation. Then the
> "response to response"
> > > > > messages will have the same Message ID (0) and the same exchange
> > > > > type and the same "response flag" as the regular response
> > > > > message for the other exchange. Moreover, they both can have the
> same content - FRAGMENT_ACK notify.
> > > > > It is impossible for the receiver to find the exchange this message
> belongs to.
> > > > > (OK, I can imagine a lot of possible approaches in this
> > > > > situation - e.g., ignore such messages or process them for both
> > > > > exchanges since it is only a hint, but this decreases the value of 
> > > > > this
> extension).
> > > > >
> > > > > In addition, you have to disable (or somehow tweak) a replay
> > > > > protection mechanism in IKEv2 since you should be able to process
> different messages with the same Message ID.
> > > > > And you already said that retransmission behavior of responders is 
> > > > > also
> changed.
> > > > >
> > > > > Overall, the proposed solution looks like a protocol hack to me
> > > > > and I'm not sure it is so easy to implement (taking into 
> > > > > considerations
> all possible cases).
> > > > >
> > > > > I think that depending on the nature of packet loss and the
> > > > > maximum size of the message, several approaches are possible.
> > > > >
> > > > > 1. If the message size is of few tens of Kbytes (so that the number of
> fragments is few tens),
> > > > >     then the simplest solution would be either to randomize the order
> fragments are sent
> > > > >     when retransmitted (or just shift them) and/or add some small 
> > > > > delay
> (20-50 ms) between sending each
> > > > >     fragment. This will cope with situation when network is quickly
> saturated or the receiver's buffers
> > > > >     are too small and receivers performance is insufficient. In this 
> > > > > case
> only the first few fragments are
> > > > >     processed and the rest is dropped. Both solutions (changing the
> order of fragment and introducing
> > > > >     delay) should help. They are both easy to implement and don't
> require protocol change.
> > > >
> > > > This is a good idea. Thanks.
> > > > Also note RFC7383 state every retransmit must include the first
> > > > segment. Our proposal relaxes this requirement when responding to
> FRAGMENT_ACK_*, because the first is received.
> > >
> > > This is incorrect, RFC 7383 does not contain this requirement.
> > > RFC 7383 says (or tries to say) that when responder has already sent
> > > the (possibly fragmented) response and it receives some
> > > (retransmitted or delayed) fragments of the request (which the responder
> has already processed), then the responder must only re-send its response if
> the received fragment number is 1 (the first fragment).
> > >
> > > Thus, the first fragment has a special meaning for the responder
> > > when it decides whether to re-send the response, but the initiator is free
> to send any subset of fragments at any time (as well as the responder).
> > >
> > > > > 2. If the message size is of several hundreds of Kbytes (so that the
> number of fragments is few hundreds),
> > > > >     then the above approach might not help. In this situation your
> proposal may not help too,
> > > > >     because the size of FRAGMENT_ACK can grow so much, that the
> message containing it
> > > > >     would be fragmented itself. In addition, if the reason of the 
> > > > > packet
> loss is also network saturation
> > > > >     or insufficient buffer size on receiver, then even with 
> > > > > individual acks
> the process may still
> > > > >     not converged (you still send a lot of extra data with each
> retransmission, that adds to the problem).
> > > > >     In this situation the preferred solution would be to redefine IKE
> exchanges, perhaps splitting
> > > > >     them into two sub-exchanges, where peer send a series of fragments
> one by one each
> > > > >     individually acknowledged (and not all fragments at once).
> > > > >
> > > > > 3. If the message size is more than 1 Mbyte, then it is not possible 
> > > > > to
> use UDP with IKE fragmentation
> > > > >     in its current form regardless of how fragments are sent and
> acknowledged, because
> > > > >     the number of fragments is limited to 2^16, thus TCP should be 
> > > > > used.
> > > >
> > > > Yes. This out of scope until number IKEv2 extend fragment numbers.
> > > > Which at this point I think is simple update RFC7383 to extend
> > > > "Total Fragments" and "Fragment Number" to 32 bit numbers from the
> > > > current 16 bits. I tried to write it down! The prposed Fragment Ack 
> > > > could
> support 32bit versions as well.
> > >
> > > I don't think that extending fragments number to 2^32 has practical sense.
> > > With 2^16 and the size of fragment around 500 bytes it is enough to
> > > transfer
> > > 32 Mbytes of data. I'm very skeptical that even with the help of
> > > acks but w/o any congestion control transferring that much data will go
> smoothly.
> > >
> > > > > And if network just randomly drops packets (I assume there is no
> > > > > congestion problems), then your proposal won't help much (in my
> opinion).
> > > > >
> > > > > I believe we are now at situation #1. Thus I think that simpler
> approaches should help.
> > > > > If we sometime reach situation #2 (e.g., if we use Classic
> > > > > McEliece with the smallest public keys), then proposals like yours can
> be considered (but I prefer less hacking approaches).
> > > >
> > > > I am trying to be a bit less hack with two notifiers!
> > >
> > > Thinking more about this I come up to an alternative proposal:
> > > https://datatracker.ietf.org/doc/draft-smyslov-ipsecme-ikev2-fragm-l
> > > arge-msg/
> > >
> > > Comparing to yours it has (as I believe) the following advantages:
> > > - request/response semantics is preserved - no "response to response"
> > > - retransmission logic is preserved - initiator is always an active
> > > side
> > > - IKE replay protection is not affected
> > > - no layer violation - the extension  can be entirely implemented in the 
> > > IKE
> fragmentation code,
> > >   upper layers (e.g., message parsing and forming) are not affected
> > > - RFC 7383 PMTU discovery is supported
> > > - traffic overhead is smaller in most cases (but I agree that not in
> > > all)
> > > - receipt status messages are protected against replays
> > > - no negotiation is needed (not a real advantage, just a feature
> > > that can be changed in future)
> > >
> > > My proposal also has one small hack (or a trick), but it is not
> > > immanent to the proposal, there are several ways how to avoid it
> > > (and perhaps it is not needed at all, this is just in case).
> >
> > The ICV trick interestg. It is smart, and I wonder wouldn't it be an
> > interop
> > risk: a non-supporting peer sees an ICV failure and must decide
> > whether to re-check. No negotiation means no clean capability
> > signaling. Using notifier is my preference. I vote to negotiate.
> >
> > Most of the other points are, in my opinion, a matter of design
> > preference, and I have mine. One concrete reason I strongly prefer ranges
> over a bitmap:
> > ranges are far easier to inspect in practice — both in Wireshark
> > dissectors and in plain log output — which matters for diagnostics and
> interop testing.
> > A bitmap requires bit-level decoding; a (start, count) pair is
> > immediately human-readable.
> >
> > The remaining concerns you raised are addressed in v3 of our draft:
> >
> > I am also open to merging the two approaches: keep Valery's ICV trick
> > to avoid negotiation, but use Notify payloads with ranges instead of a
> > bitmap. This would combine the cleaner diagnostics and human-readable
> > encoding of ranges with the no-negotiation property of Valery's design.
> >
> > Would others in the WG like to weigh in?
> >
> > Looking forward to tomorrow's presentation, and hoping we have time
> > during the session to discuss both drafts.
> >
> > regards,
> > -antony
> >
> > _______________________________________________
> > IPsec mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]

_______________________________________________
IPsec mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[IPsec] Re: [[email protected]: New Version Notification for draft-antony-ipsecme-ikev2-fragment-acknowledgment-01.txt]

Reply via email to