Hi Graham, > Hi Antony > > I haven't read your draft, but will do and will comment if it helps. > > FYI I pinged the text below to Valery over the weekend after reading his draft > (which I liked). > > Since sending the text below, I had a thought about a super long bitmask > potentially becoming fragmented itself.. however i'm not sure how likely that > would be.
If the bitmap becomes too long, you can send several receipt status messages, each containing only a part of the whole bitmap (this is what the First Fragment Num and the Last Fragment Num fields for). But I didn't do experiments with this scenario. Regards, Valery. > cheers > > //// > > 4.2.1.5. Implementation Details > > > > When a sender uses the techniques described in Sections 4.1.1 (randomized > fragment ordering) and 4.1.2 (inter-fragment delays), a receiver cannot > immediately distinguish between fragments that have been lost in transit and > fragments that are still enroute due to deliberate pacing by the sender or > reordering in the network. A receiver that sends a Receipt Status Message > (Section 4.2.1.4) prematurely, e.g., prior to all fragments having had > reasonable time to arrive, will request retransmission of fragments that are > not in fact lost. This can result in unnecessary duplicate traffic, wasted > bandwidth, and in severe cases a feedback loop of spurious retransmissions > that worsens the congestion that the unilateral techniques were designed to > alleviate. > > > > To mitigate this, implementations SHOULD adopt a fragment collection > strategy that accounts for the expected arrival pattern of fragments. > This document defines three approaches, in order of increasing complexity. An > implementation MAY support more than one and allow the operator to select > the appropriate mode based on deployment characteristics. > > > > 4.2.1.5.1. Strict Mode > > > > In Strict Mode, the receiver MUST NOT send a Receipt Status Message until a > configurable hold-down timer has expired after receipt of the first fragment > of > a message. The hold-down timer value SHOULD be set by the operator based > on knowledge of the network characteristics between the peers. > > > > The following guidance is provided for selecting hold-down timer values: > > > > - For low-latency, high-reliability networks (e.g., data centre interconnects, > enterprise LAN): a hold-down timer of 50-200 milliseconds is RECOMMENDED. > On such networks, fragments that have not arrived within this window are > almost certainly lost. > > - For typical Internet paths with moderate latency: a hold-down timer of 500 > milliseconds to 2 seconds is RECOMMENDED. > > - For high-latency or bandwidth-constrained links (e.g., satellite > communications, congested mobile networks): a hold-down timer of 3-10 > seconds or more may be necessary. On such links, propagation delay alone can > be several hundred milliseconds, and the sender may be deliberately pacing > fragments over an extended period. Operators SHOULD set the hold-down > timer to at least twice the expected one-way propagation delay of the link. > > > > If no hold-down timer is configured, the implementation MUST use a default > value of no less than 1 second. > > > > 4.2.1.5.2. Relaxed Mode > > > > In Relaxed Mode, the receiver tracks the arrival times of incoming fragments > and MUST NOT send a Receipt Status Message while fragments are still > arriving at a steady rate. The receiver SHOULD send a Receipt Status Message > only after a quiescence period during which no new fragments have been > received. > > The quiescence period SHOULD be set to at least twice the observed mean > inter-arrival time of fragments received so far in the current exchange. This > allows the receiver to adapt to the sender's actual pacing behaviour without > prior configuration. > > > > Relaxed Mode is suitable for deployments where the network characteristics > are unknown or variable, as it requires no operator configuration. However, it > may be slower to react to genuine loss than Strict Mode with a well-tuned > timer. > > > > 4.2.1.5.3. Adaptive Mode > > > > In Adaptive Mode, the receiver combines both approaches. It uses a > configurable minimum hold-down timer (as in Section 4.2.1.5.1) and > additionally applies quiescence detection (as in Section 4.2.1.5.2). > The receiver MUST NOT send a Receipt Status Message until both conditions > are met: the hold-down timer has expired AND no new fragments have arrived > for the quiescence period. > > This mode is RECOMMENDED for general-purpose implementations as it > provides a safety floor via the timer while adapting to actual network > conditions via quiescence detection. > > > > 4.2.1.5.4. Interaction with IKEv2 Retransmission Timers > > > > Implementations that use short initial retransmission timers with exponential > back-off (as is common in deployed IKEv2 implementations) MUST ensure that > the fragment collection hold-down period is considered when calculating > retransmission timeouts. If the sender's retransmission timer fires before the > receiver has had time to collect all fragments and respond with a Receipt > Status Message, the sender will retransmit the entire message (or the first > fragment per Section 4.1.3), defeating the purpose of selective > retransmission. > > Specifically, when a sender is transmitting a large and fragmented message and > is aware that selective retransmission may be in use, the sender's > retransmission timer for that exchange > > > > SHOULD be set to a value no less than the time required to transmit all > fragments (including any inter-fragment delays) plus a reasonable allowance > for the receiver to process the fragments and return a Receipt Status Message. > > > > 4.2.1.5.5. Considerations for Bandwidth-Constrained and High-Latency > Networks > > > > On satellite communication links and other high-latency, low-bandwidth > networks, the interaction between the techniques described in this document > requires particular care. These networks exhibit the combination of high > propagation delay (often 250ms or more one-way for geostationary links), > limited bandwidth that makes congestion from spurious retransmissions > particularly costly, and higher baseline packet loss rates that make selective > retransmission most valuable. > > > > This creates a tension: the receiver benefits most from selective > retransmission (because fragments are more likely to be genuinely lost), but > must also wait longest before requesting it (because fragments take longest to > arrive). Implementations deployed in these environments SHOULD use > Adaptive Mode with a hold-down timer of at least one full round-trip time of > the link and SHOULD err on the side of caution when in doubt. > > > > 4.2.1.5.6. Fragment Count as a Receiver Heuristic > > > > The Total Fragments field in the Encrypted Fragment payload (Section > 2.5 of [RFC7383]) is available to the receiver from the moment the first > fragment arrives. This value provides a useful implicit signal that the > receiver > MAY use to adjust its fragment collection behaviour without requiring any > protocol extension or negotiation. > > A message fragmented into a small number of fragments (e.g., fewer than 20) > is likely to be fully transmitted by the sender within a short time window, > even > with inter-fragment delays. A message fragmented into a large number of > fragments (e.g., 100 or more) will take substantially longer to transmit, > particularly when the sender is using the rate-limiting technique of Section > 4.1.2. The receiver can use the Total Fragments value to scale its hold-down > timer or quiescence period accordingly. > > > > The following approach is RECOMMENDED. Implementations SHOULD allow > the operator to configure a per-fragment delay estimate (in > milliseconds) representing the expected inter-fragment spacing used by the > sender. The receiver then calculates an adjusted hold-down timer > as: > > > > adjusted_holddown = base_holddown + (Total_Fragments * > per_fragment_delay) > > > > where base_holddown is the hold-down timer value as described above. > This ensures that the receivers time-out window scales linearly with the size > of > the message being received. > > Senders can influence receiver behaviour through their choice of fragment > size, which determines the Total Fragments count. > > > > A sender on a low-latency, high-bandwidth link MAY choose a smaller > fragment size (producing more fragments) if it determines that the receiver or > intermediate network can handle the higher packet rate. > Conversely, a sender on a high-latency or bandwidth-constrained link (e.g., > satellite communication) SHOULD use a larger fragment size where possible to > reduce the total number of fragments, thereby reducing both the transmission > time and the window during which the receiver must wait before concluding > that fragments are missing. > > > > The following guidance is provided for a sender fragment size selection based > on network characteristics: > > > > - On low-latency, high-bandwidth networks: the sender MAY use the minimum > fragment size (i.e., the path MTU minus IKEv2 overhead), as the receiver can > absorb a high packet rate and the resulting large fragment count should not > cause excessive delay before selective retransmission can engage. > > - On moderate-latency Internet paths: the sender SHOULD use the path MTU > as the fragment size, which is the default behaviour defined in [RFC7383]. > > - On high-latency or bandwidth-constrained links: the sender SHOULD avoid > producing an unnecessarily large number of fragments. Where the path MTU > permits, a fragment size larger than the minimum SHOULD be used. The trade- > off is that larger fragments are more costly to retransmit individually if > lost, but > the reduced fragment count allows the receiver to engage selective > retransmission sooner with greater confidence that gaps represent genuine > loss. > > > > Implementations SHOULD allow the operator to configure the fragment size or > to select a network profile (e.g., "low-latency", "internet", > "satellite") that sets appropriate defaults for both the fragment size and the > receiver's hold-down parameters. > > > > 4.2.1.5.7. Duplicate Fragment Handling > > > > Regardless of the mode in use, a receiver that has already successfully > processed a fragment and subsequently receives a duplicate, (whether from a > spurious retransmission or some form of network duplication) MUST silently > discard the duplicate. > Implementations MUST NOT treat receipt of a duplicate fragment as an error > condition. > > > > On Wed, Mar 18, 2026 at 3:03 PM Antony Antony <[email protected]> > wrote: > > > > Hi Valery, > > > > Thanks for taking the time to present your draft at tomorrow's session. > > I quickly went through your slides — appreciate you including a > > comparison with our draft. I am sorry for my delayed response! > > I updated our draft back in January, and didn't get around to responding it. > > > > Thnaks for the numbers in your slides. That give better picture. > > > > On Thu, Dec 11, 2025 at 03:19:18PM +0300, Valery Smyslov wrote: > > > Hi Antony, > > > > > > please, see inline. > > > > > > > Hi Valery, > > > > Thank you for the detailed feedback. > > > > > > > > I have been looking through the simultaneous-initiation case you > > > > describe, where both peers have just completed an IKE SA rekey and > > > > therefore begin with Message ID 0 on each side. One situation can > > > > be slightly problematic when there delayed responses, however, I > > > > don't see any case where the proposed ack would fail to advance the > negotation. > > > > > > > > Still to make it clear at the end I am proposing two direction specific > Notifiers instead of one. > > > > > > This would help. However, it won't work if some future (imaginary) > > > IKE extension makes each exchange to use different key (e.g., as > KDF(SK_ex, MSG-ID). > > > > Once this draft is standardized, any such future (imaginary) extension > > would need to accommodate the existing mechanism regardless. More > > importantly, your proposal has the same property: the Receipt Status > > Message is sent with the same Message ID as the original exchange, so > > you also have two messages sharing a Message ID — the receipt status > > and the actual IKE response. The concern applies equally to both drafts. > > > > As for the Message ID as AEAD counter: yes, implementations need to > > handle this carefully, but less of a protocol correctness issue. > > Implementations can track the context create a monotonus counter as IV. > > > > > > > > > Here is How I see the case you described. I am using > > > > CREATE_CHILD_SA as example. The analysis would similar for other > > > > excahnge too. > > > > > > > > 1. Simultaneous CREATE_CHILD_SA requests after rekey In the > > > > simplest case: > > > > > > > > ---- IKE SA Rekeyed both ends Message ID 0 Request > > > > Initiator Responder > > > > > > > > MID(0) CREATE_CHILD_SA ----> <------ MID(0) CREATE_CHILD_SA > > > > FACK(MID=0, respose flag=1) ---> <------ FACK(MID=0, respose > > > > flag=1) > > > > > > > > Since each peer knows it has an outstanding request with MID=0, > > > > the received FACK(MID=0,R=1) can be unambiguously associated with > > > > its own outstanding request. > > > > > > Yes. > > > > > > > 2. Case where one peer has advanced its CREATE_CHILD_SA exchange > > > > and the response is lost > > > > > > > > A more interesting scenario is when both peers send the > > > > CREATE_CHILD_SA request, but one peer sends its response and then > > > > advances its internal state, while the response is lost: > > > > > > > > The actual CREATE_CHILD_SA response fragments are lost. And the > > > > initiator responsd with FACK(MID=0, respose flag) > > > > > > > > MID(0) CREATE_CHILD_SA ----> <------ MID(0) CREATE_CHILD_SA > > > > FACK(MID=0, respose flag=1) ---> <------ Partial Retransmit (MID=0) > > > > > > > > MID(0) CREATE_CHILD_SA respose flag=1 ----> > > > > > > > > <------MID(0) CREATE_CHILD_SA respose > > > > flag=1 ----> > > > > <------ FACK(MID=0, respose > > > > flag=1) > > > > > > > > Here, once the responders have advanced past CREATE_CHILD_SA, any > > > > FACK it receives later clearly corresponds to the response it sent. > > > > The initiator can correctly attribute that FACK to the outstanding > > > > response it is waiting for. > > > > > > I meant the case: > > > > > > MID(0) CREATE_CHILD_SA ----> <------ MID(0) CREATE_CHILD_SA > > > FACK(MID=0, response flag=1) ---> (1) (delayed) > > > > > > <---- > > > MID(0) CREATE_CHILD_SA response > flag=1 > > > FACK(MID=0, response flag=1) ---> (2) > > > (1 > > > received) > > > > > > Message (1) is the FACK response to the responder's request while > > > message (2) is the FACK response to responder's response to initiator's > request. > > > The responder cannot distinguish these two messages. > > > I agree that making the content different would help (but see > > > above), but in general this is a headache to implement (since it > > > violates the steps the incoming message is processed - it is > > > processed in a context of a particular exchange that is determined before > the message is parsed). > > > > > > > 3. Delayed or misordered FACK messages I agree there are corner > > > > cases where a delayed FACK may arrive late and overlap with > > > > another exchange with same MID, 0 in this case.. > > > > However, in these cases processing the FACK as a hint rather than > > > > a state-advancing message does not break protocol correctness. > > > > At worst, a late FACK would simply cause an extra re-transmit of > > > > fragments that already arrived. > > > > > > > > Addressing your core concern: distinguishing request-side vs > > > > response-side acknowledgments > > > > > > > > To address the case where a FACK for a request and a FACK for a > > > > response may look identical (same MID, same exchange type, same R > > > > flag), I agree this could lead to an un necessary ambiguity in > > > > simultaneous-initiation scenarios. > > > > > > > > To resolve this cleanly, I propose defining two separate Notify > > > > Status > > > > Types: > > > > > > > > FRAGMENT_ACK_REQ — acknowledgment of fragments belonging to a > > > > request > > > > > > > > FRAGMENT_ACK_RES — acknowledgment of fragments belonging to a > > > > response > > > > > > > > These two notifiers would make the semantic direction explicit, > > > > eliminating any ambiguity you describe even in simultaneous > > > > exchanges with identical Message IDs. > > > > > > > more responses bellow inline. > > > > > > > > On Wed, Nov 26, 2025 at 03:24:15PM +0300, Valery Smyslov wrote: > > > > > HI Antony, > > > > > > > > > > I doubt that this proposal is workable, at least in some situations. > > > > > Consider the IKE SA was just rekeyed, so that each peer starts > > > > > its first exchange with Message ID = 0. And consider they > > > > > simultaneously initiate same exchange, say CREATE_CHILD_SA. And > > > > > consider the response messages need fragmentation. Then the > "response to response" > > > > > messages will have the same Message ID (0) and the same exchange > > > > > type and the same "response flag" as the regular response > > > > > message for the other exchange. Moreover, they both can have the > same content - FRAGMENT_ACK notify. > > > > > It is impossible for the receiver to find the exchange this message > belongs to. > > > > > (OK, I can imagine a lot of possible approaches in this > > > > > situation - e.g., ignore such messages or process them for both > > > > > exchanges since it is only a hint, but this decreases the value of > > > > > this > extension). > > > > > > > > > > In addition, you have to disable (or somehow tweak) a replay > > > > > protection mechanism in IKEv2 since you should be able to process > different messages with the same Message ID. > > > > > And you already said that retransmission behavior of responders is > > > > > also > changed. > > > > > > > > > > Overall, the proposed solution looks like a protocol hack to me > > > > > and I'm not sure it is so easy to implement (taking into > > > > > considerations > all possible cases). > > > > > > > > > > I think that depending on the nature of packet loss and the > > > > > maximum size of the message, several approaches are possible. > > > > > > > > > > 1. If the message size is of few tens of Kbytes (so that the number of > fragments is few tens), > > > > > then the simplest solution would be either to randomize the order > fragments are sent > > > > > when retransmitted (or just shift them) and/or add some small > > > > > delay > (20-50 ms) between sending each > > > > > fragment. This will cope with situation when network is quickly > saturated or the receiver's buffers > > > > > are too small and receivers performance is insufficient. In this > > > > > case > only the first few fragments are > > > > > processed and the rest is dropped. Both solutions (changing the > order of fragment and introducing > > > > > delay) should help. They are both easy to implement and don't > require protocol change. > > > > > > > > This is a good idea. Thanks. > > > > Also note RFC7383 state every retransmit must include the first > > > > segment. Our proposal relaxes this requirement when responding to > FRAGMENT_ACK_*, because the first is received. > > > > > > This is incorrect, RFC 7383 does not contain this requirement. > > > RFC 7383 says (or tries to say) that when responder has already sent > > > the (possibly fragmented) response and it receives some > > > (retransmitted or delayed) fragments of the request (which the responder > has already processed), then the responder must only re-send its response if > the received fragment number is 1 (the first fragment). > > > > > > Thus, the first fragment has a special meaning for the responder > > > when it decides whether to re-send the response, but the initiator is free > to send any subset of fragments at any time (as well as the responder). > > > > > > > > 2. If the message size is of several hundreds of Kbytes (so that the > number of fragments is few hundreds), > > > > > then the above approach might not help. In this situation your > proposal may not help too, > > > > > because the size of FRAGMENT_ACK can grow so much, that the > message containing it > > > > > would be fragmented itself. In addition, if the reason of the > > > > > packet > loss is also network saturation > > > > > or insufficient buffer size on receiver, then even with > > > > > individual acks > the process may still > > > > > not converged (you still send a lot of extra data with each > retransmission, that adds to the problem). > > > > > In this situation the preferred solution would be to redefine IKE > exchanges, perhaps splitting > > > > > them into two sub-exchanges, where peer send a series of fragments > one by one each > > > > > individually acknowledged (and not all fragments at once). > > > > > > > > > > 3. If the message size is more than 1 Mbyte, then it is not possible > > > > > to > use UDP with IKE fragmentation > > > > > in its current form regardless of how fragments are sent and > acknowledged, because > > > > > the number of fragments is limited to 2^16, thus TCP should be > > > > > used. > > > > > > > > Yes. This out of scope until number IKEv2 extend fragment numbers. > > > > Which at this point I think is simple update RFC7383 to extend > > > > "Total Fragments" and "Fragment Number" to 32 bit numbers from the > > > > current 16 bits. I tried to write it down! The prposed Fragment Ack > > > > could > support 32bit versions as well. > > > > > > I don't think that extending fragments number to 2^32 has practical sense. > > > With 2^16 and the size of fragment around 500 bytes it is enough to > > > transfer > > > 32 Mbytes of data. I'm very skeptical that even with the help of > > > acks but w/o any congestion control transferring that much data will go > smoothly. > > > > > > > > And if network just randomly drops packets (I assume there is no > > > > > congestion problems), then your proposal won't help much (in my > opinion). > > > > > > > > > > I believe we are now at situation #1. Thus I think that simpler > approaches should help. > > > > > If we sometime reach situation #2 (e.g., if we use Classic > > > > > McEliece with the smallest public keys), then proposals like yours can > be considered (but I prefer less hacking approaches). > > > > > > > > I am trying to be a bit less hack with two notifiers! > > > > > > Thinking more about this I come up to an alternative proposal: > > > https://datatracker.ietf.org/doc/draft-smyslov-ipsecme-ikev2-fragm-l > > > arge-msg/ > > > > > > Comparing to yours it has (as I believe) the following advantages: > > > - request/response semantics is preserved - no "response to response" > > > - retransmission logic is preserved - initiator is always an active > > > side > > > - IKE replay protection is not affected > > > - no layer violation - the extension can be entirely implemented in the > > > IKE > fragmentation code, > > > upper layers (e.g., message parsing and forming) are not affected > > > - RFC 7383 PMTU discovery is supported > > > - traffic overhead is smaller in most cases (but I agree that not in > > > all) > > > - receipt status messages are protected against replays > > > - no negotiation is needed (not a real advantage, just a feature > > > that can be changed in future) > > > > > > My proposal also has one small hack (or a trick), but it is not > > > immanent to the proposal, there are several ways how to avoid it > > > (and perhaps it is not needed at all, this is just in case). > > > > The ICV trick interestg. It is smart, and I wonder wouldn't it be an > > interop > > risk: a non-supporting peer sees an ICV failure and must decide > > whether to re-check. No negotiation means no clean capability > > signaling. Using notifier is my preference. I vote to negotiate. > > > > Most of the other points are, in my opinion, a matter of design > > preference, and I have mine. One concrete reason I strongly prefer ranges > over a bitmap: > > ranges are far easier to inspect in practice — both in Wireshark > > dissectors and in plain log output — which matters for diagnostics and > interop testing. > > A bitmap requires bit-level decoding; a (start, count) pair is > > immediately human-readable. > > > > The remaining concerns you raised are addressed in v3 of our draft: > > > > I am also open to merging the two approaches: keep Valery's ICV trick > > to avoid negotiation, but use Notify payloads with ranges instead of a > > bitmap. This would combine the cleaner diagnostics and human-readable > > encoding of ranges with the no-negotiation property of Valery's design. > > > > Would others in the WG like to weigh in? > > > > Looking forward to tomorrow's presentation, and hoping we have time > > during the session to discuss both drafts. > > > > regards, > > -antony > > > > _______________________________________________ > > IPsec mailing list -- [email protected] > > To unsubscribe send an email to [email protected] _______________________________________________ IPsec mailing list -- [email protected] To unsubscribe send an email to [email protected]
