[see conclusion at the end if you skip some stuff]

After the feedback this afternoon I read RFC 4821, which is about discovering the path MTU by probing rather than depending on ICMP messages.

This approach has the advantage that it can work just as well over a layer 2 path with an unknown MTU as over a layer 3 path where traditional path MTU discovery doesn't work because ICMP "too big" messages aren't sent or received.

This means that it would be possible for hosts implementing this mechanism to simply set the maximum MTU search size to their local non-standard MTU and everything will work.

In my approach, I wanted to avoid sending oversized packets that don't make it through the layer 2 network as much as possible to avoid the problems this may cause. In the degenerative case, a 10/100/1000 Mbit host sends a bunch of oversized packets in a short time because a number of TCP sessions are searching for the MTU and this happens on an old 10 Mbps network where this leads to some kind of exception state with further negative impact. (Hubs/switches that disconnect ports with too many errors, that kind of thing.)

Another difference is that in my draft, routers can announce a TCP MSS value but the MTU discovery overrides this information on the local subnet, which makes it easy to stick to 1500 byte (or smaller) packets across the net but use larger packets locally. Routers can also announce a maximum allowed MTU so it's easy to make sure that hosts don't send packets larger than a certain size administratively if desired. Obviously it's also possible to simply announce the largest possible MSS so large packets can be used across the internet (I think this may have been unclear this afternoon).

Last but not least, the RFC 4821 mechanism must be implemented per transport protocol, while the mechanism in my draft works at the IP layer so it doesn't introduce new logic in transports.

The idea of having switches send "too big" messages isn't very attractive for three reasons:

1. This isn't very robust at the IP layer with traditional PMTUD
2. A node could send packets that are so large that the switch can't receive them so it's not possible to send an ICMP message back either 3. Nodes would need to know whether the switches support this before they can send larger packets, which is more or less the same reason why most subnets aren't configured for jumboframes today

First reaction to issues with neighbor discovery over tunnels: tunnels have problems with MTUs in general and PMTUD in particular. There are many opportunities for problems, but I think in practice the mechanism I proposed wouldn't lead to much additional trouble because the MTU for the tunnel interface is generally low enough that the mechanism isn't used anyway, and probing + neighbor unreachability detection (for IPv6) will make sure it's possible to avoid problems and/or recover from them.

Multiple paths with multiple MTUs: you can't have loops in your ethernet topology, so the only way to do this is with 802.3ad link aggregation. As far as I can tell, at least some Cisco equipment makes sure bundled links all use the same MTU. Not sure if 802.3ad says anything about this. Also, switches don't send packets belonging to the same session over different links in a bundle to avoid packet reordering. So if an MTU failure occurs, it will be consistent. Because the actual traffic and the ICMP probe message aren't necessarily the same "session" it's possible that MTU probes and data traffic see different MTUs. Neighbor unreachability detection will have to detect the problem so the neighbor MTU is reset.

About 9000 bytes is not enough: all MTU fields are 32 bits in the draft. :-)

Someone made me aware of this:

http://grouper.ieee.org/groups/802/3/frame_study/index.html

This effort doesn't increase the payload size of ethernet packets, though.

My conclusion:

Wide scale implementation of RFC 4821 makes the MTU probing packets unnecessary, but this and the other options and messages can still be useful for severa reasons:

- skip probing steps because neighbor MTU is known immediately
- allow administrators to limit MTU sizes
- use different MTUs for different link speeds for jitter/delay control and interaction with nodes/switches with limited capabilities
- allow unmodified transports to use larger packets

I'm thinking that it's probably possible and desireable to make all messages and options optional, with the exception of something that allows administrators to limit the MTU subnet-wide with one setting. But maybe cases can be made for completely removing some messages or options because it's unlikely they'll be implemented or provide many benefits if implemented. However, please note that although the number of new options and messages may seem a bit high, the way in which they work is actually very straightforward with very simple decision making logic and only a single new timer introduced.

The goal is to allow the use of larger packets between supporting nodes on a subnet. Whatever gets that done without breaking any old stuff that's reasonably still in use is fine by me.


_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area

Reply via email to