[IPsec] ipsecme-ikev2-mtu-dect

Michael Richardson Tue, 16 Aug 2022 13:40:05 -0700

TL;DR>  please adopt document now so that WG can work on it.
        I believe in early adoption of documents.


Hi, I watched the IPsecME WG recording from IETF114 today as I had a conflict
with ANIMA WG.  I saw the conversation about MTU and getting early Transport
Area review, and I then read the document.

First, I think that the Introduction is a bit confused about how things work
when there is an IPsec tunnel, and the history of when/how and why Packet Too
Big messages are trusted/untrusted.  The WG should not ask for Transport Area
review until this part is fixed.  Second, IPv6 is mentioned only once, and
the case for IPv6 over ESP-IPv4 tunnel is common and likely to increase as
demand for always on IPv6 rises.
This document won't get anywhere without an IPv6 section.

Third, I have actually spent a lot of time going back to 2003 on the
fragmentation vs DF for IPsec tunnels.  The situation is very different for
traffic in the tunnel vs IKEv2 messages, btw, and not just because they might
take different paths inside the network.
In particular FreeS/WAN's KLIPS stack intentionally ignored the inner DF bit,
put the result into a too-big ESP packet and then fragmented that.  I.e. did
not copy the DF bit from inside to outside.

This was done for operational reasons, but the incidence of fragments being
dropped has increased, and of course, IPv6 does not support this at all.
One concern from transport people was that on 1Gb/s network (high speed at
the time), the cycle time for fragment IDs was quite short, and it was quite
possible to get an ESP packet stuck in the re-assembly queue, and then for
another one with the same fragment ID to arrive and for it to be
reassembled.

If this were a bare TCP segment, then it could result in a bad TCP flow, and
this situation had been observed in the field, which is why transport people
really really wanted to avoid fragmentation.  Now, we never did get 9000-byte
ethernet reliably deployed, although most ethernet switches and adapters that
we have support it, and one can up your MTU locally and get way better
throughput within your enterprise.  The lack of credible MTU estimation means
that if you do this, you likely wind up failing all non-local connections.
But, the discusion at the time was that the mis-assembled ESP was protected
by the cryptographic integrity check (HMAC-FOO), and that was way way
stronger than the TCP checksum, and so actually fragmenting ESP packets was
not really that much of a risk.

But, this activity breaks PLPMTU for that hop!
I tried hard during RFC8200 and RFC8504 to get PLPMTU to be MTI and to be the
default method, but there was push back.  We don't have enough data, people
said.  I also tried to get it turned on for Linux distros, and for the Linux
kernel to ship with it enabled by default, but "not enough data".
I asked a few people at the BigTech companies if they had data or would do an
experiment, but thanks to Transport Segment Offload (TSO),  the last they
EVER want is for a TCP segment to get rejected and they take a retransmit
delay.  The TXOPs were just too valuable, so they typically set their MTU
(TCP MSS) to 1400 or so that they never experience this.

There are two MTUs that actually matter.
1) the MTU of the links between the two gateways.
2) the MTU of the links behind the receiving gateway.  It is easy to
   mis-program the ICMP PTB on the receiving gateway so that it turns out not to
   fit into the tunnel, and gets dropped.  I don't know if that's still an
   issue with gateways, but it's a common mis-configuration with a Linux kernel.

I think that having an option to enable a receiving gateway system to tell the
sending gateway what is considered a too big packet is very useful.  Even if
we have IP-TFS to mitigate the effects of too big packets.
It can be based up many things.

As Tero observed, a receiving gateway could observe the size of the ESP
fragments that it successfully receives, and could conclude that the link is
*at least* that big.  Of course, an intermediate fragmenter could split the
packet into two even pieces rather than a bit and a small bit.  There are
other heuristics, and yes, we can use PLPMTU on our ESP flow.
We can create probe packets that get bigger until we discover the true size.
None of this requires new standards, but the MTU observed notify is useful.
(And yes, we need to acknowledge it, because all IKEv2 messages need to be
acknowledged, so it's just another notify, and I assumed that Daniel had
simply omitted the ack on his slide)

I would drop much of section 4.1.
Section 4.2, should be aware of IPv6 minimum MTU is 1280, and IPv4 minimum
reasonable MTU is typically cited as 576.
I would remove "IP4_" from the notifications message names.
I would remove all mention of PMTUD.








-- 
Michael Richardson <mcr+i...@sandelman.ca>, Sandelman Software Works
 -= IPv6 IoT consulting =-

signature.asc
Description: PGP signature

_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

[IPsec] ipsecme-ikev2-mtu-dect

Reply via email to