[Int-area] Larger MTUs

Iljitsch van Beijnum Mon, 23 Jul 2007 16:48:26 -0700

[see conclusion at the end if you skip some stuff]

After the feedback this afternoon I read RFC 4821, which is aboutdiscovering the path MTU by probing rather than depending on ICMPmessages.

This approach has the advantage that it can work just as well over alayer 2 path with an unknown MTU as over a layer 3 path wheretraditional path MTU discovery doesn't work because ICMP "too big"messages aren't sent or received.

This means that it would be possible for hosts implementing thismechanism to simply set the maximum MTU search size to their localnon-standard MTU and everything will work.

In my approach, I wanted to avoid sending oversized packets thatdon't make it through the layer 2 network as much as possible toavoid the problems this may cause. In the degenerative case, a10/100/1000 Mbit host sends a bunch of oversized packets in a shorttime because a number of TCP sessions are searching for the MTU andthis happens on an old 10 Mbps network where this leads to some kindof exception state with further negative impact. (Hubs/switches thatdisconnect ports with too many errors, that kind of thing.)

Another difference is that in my draft, routers can announce a TCPMSS value but the MTU discovery overrides this information on thelocal subnet, which makes it easy to stick to 1500 byte (or smaller)packets across the net but use larger packets locally. Routers canalso announce a maximum allowed MTU so it's easy to make sure thathosts don't send packets larger than a certain size administrativelyif desired. Obviously it's also possible to simply announce thelargest possible MSS so large packets can be used across the internet(I think this may have been unclear this afternoon).

Last but not least, the RFC 4821 mechanism must be implemented pertransport protocol, while the mechanism in my draft works at the IPlayer so it doesn't introduce new logic in transports.

The idea of having switches send "too big" messages isn't veryattractive for three reasons:


1. This isn't very robust at the IP layer with traditional PMTUD

2. A node could send packets that are so large that the switch can'treceive them so it's not possible to send an ICMP message back either3. Nodes would need to know whether the switches support this beforethey can send larger packets, which is more or less the same reasonwhy most subnets aren't configured for jumboframes today

First reaction to issues with neighbor discovery over tunnels:tunnels have problems with MTUs in general and PMTUD in particular.There are many opportunities for problems, but I think in practicethe mechanism I proposed wouldn't lead to much additional troublebecause the MTU for the tunnel interface is generally low enough thatthe mechanism isn't used anyway, and probing + neighborunreachability detection (for IPv6) will make sure it's possible toavoid problems and/or recover from them.

Multiple paths with multiple MTUs: you can't have loops in yourethernet topology, so the only way to do this is with 802.3ad linkaggregation. As far as I can tell, at least some Cisco equipmentmakes sure bundled links all use the same MTU. Not sure if 802.3adsays anything about this. Also, switches don't send packets belongingto the same session over different links in a bundle to avoid packetreordering. So if an MTU failure occurs, it will be consistent.Because the actual traffic and the ICMP probe message aren'tnecessarily the same "session" it's possible that MTU probes and datatraffic see different MTUs. Neighbor unreachability detection willhave to detect the problem so the neighbor MTU is reset.

About 9000 bytes is not enough: all MTU fields are 32 bits in thedraft. :-)


Someone made me aware of this:

http://grouper.ieee.org/groups/802/3/frame_study/index.html

This effort doesn't increase the payload size of ethernet packets,though.


My conclusion:

Wide scale implementation of RFC 4821 makes the MTU probing packetsunnecessary, but this and the other options and messages can still beuseful for severa reasons:


- skip probing steps because neighbor MTU is known immediately
- allow administrators to limit MTU sizes

- use different MTUs for different link speeds for jitter/delaycontrol and interaction with nodes/switches with limited capabilities

- allow unmodified transports to use larger packets

I'm thinking that it's probably possible and desireable to make allmessages and options optional, with the exception of something thatallows administrators to limit the MTU subnet-wide with one setting.But maybe cases can be made for completely removing some messages oroptions because it's unlikely they'll be implemented or provide manybenefits if implemented. However, please note that although thenumber of new options and messages may seem a bit high, the way inwhich they work is actually very straightforward with very simpledecision making logic and only a single new timer introduced.

The goal is to allow the use of larger packets between supportingnodes on a subnet. Whatever gets that done without breaking any oldstuff that's reasonably still in use is fine by me.



_______________________________________________
Int-area mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/int-area

[Int-area] Larger MTUs

Reply via email to