Re: jumbo frame of GbE and IPv6 -- A proposal

Iljitsch van Beijnum Tue, 26 Jul 2005 09:49:12 -0700

On 26-jul-2005, at 13:46, Perry Lorier wrote:

6. Minimise the resources used.

Agree, except that packets are cheap on a 1000 Mbps LAN, so thosedon't
count much towards 6.

Packet rate however starts becoming a problem at faster speeds, atgige

it starts becoming a problem for hosts to deal with unless they are
careful.  And not all networks are fast, 3G networks are becoming more
prevalent.  We should not waste resources needlessly :)

Well, the places where jumboframes are worth the trouble are also theplaces where a handful of packets won't make a difference. I'm notsure how fast 3G is, but I believe not more than a few Mbps, sojumboframes really aren't very useful there because they occupy thechannel for too long. Doubly so on radio networks with their high biterror rates.

What happens on l2's where not every node can see every other node?

Neighbor discovery fails?

Host A can talk to Host B ok.
Host A can talk to Host C ok.
Host B can't talk to Host C.

This happens in ad hoc wireless networks.  With your system I'm not
entirely sure how you deal with who's "turn" it is next if not allnodescan see all other nodes. Host A should still be able to talk toHost B.

Well, a simple way to decide could be a log of the difference in MACaddress. So after host 20 sends its packet, host 28 would wait for 3seconds and host 36 for 4 seconds. But host 36 hears host 28 andresets its timer to 3 seconds. If hosts 28 and 36 can't hear eachother, host 36 will send its packet 1 second after host 28 ratherthan 3 seconds. No big deal.

If A and B are talking to each other and C and D are talking to each
other, why do (A and B) need to talk to C and D?

Ah, but how do you know that A doesn't talk to D, and is nevergoing to?

How do you know it will in the time before the topology of the network
changes?  Given that the topology of the network changes every time a

host comes and goes, the chance that you'll want to talk to most ofthe

users during time is rather low.

Look at it this way: if two routers send out RAs every 10 seconds,that's one packet every 5 seconds. If 60 hosts all send one packetevery five minutes, that's also one packet every 5 seconds.

I'd start at the minimum "MTU" size.

Yes, I thought about this and first trying a 1508 byte packet makessense: if jumboframes don't work, you've wasted as little time andbandwidth as possible. If they do work, you've only wasted 1508 bytes.

A colleague of mine (Matthew Luckie) has done some research into path
MTU's.  He has a work in progress paper (

http://www.wand.net.nz/~mjl12/debugging-pmtud.pdf ) where heenumerates

all the common MTU's he's seen on the Internet.

And reaches a very interesting conclusion! Exchanging per-neighborMTUs would really help here.

I'd start with a similar table trying the lowest size, and sendingthat,if it's received try the next lowest size and so on until you don'tgeta reply. When you don't get a reply try the previous-mtu-that-worked+1,if that succeeds start a binary search between previous-mtu-that-worked
and the one that didn't.

I partially agree. If you're at a well known boundary and want tosearch upward, it makes sense to try that well known boundary +minimum increment (I say: 4) first. That way, if you can't go beyondthe current boundary, you know so immediately. Next is the highestpossible value. If you can use that one, you're done.

But if previous low + minimum works but maximim doesn't, a mostlybinary search still makes sense. However, it could be a "hinted"binary search. For instance, if you're searching between 1508 and9000 (with the target being 4464) a strict binary search would do:


1  1508 yes
2  9000 no
3  5252 no
4  3380 yes
5  4316 yes
6  4784 no
7  4548 no
8  4432 yes
9  4488 no
10 4460 yes
11 4472 no
12 4464 yes
13 4468 no

A hinted binary search could be:

1  1508 yes
2  9000 no
3  4470 no (closest value to binary 5252 target)
4  2048 yes (closest value to binary 2988 target)
5  2052 yes (see if 2048 was our limit)
6  4352 yes (closest value to binary 3260 target)
7  4356 yes (see if 4352 was our limit)
8  4464 yes (closest value to binary 4412 target)
9  4468 no (4464 was our limit)

Note that although the second variant is faster overal, the first onefinds a reasonable candidate (that can already be used at that point)at try 5, and the second one at try 6.

In this case your serial bottom-to-top search would probably be a bitfaster, but it has two disadvantages: it takes a long time to find ahigh MTU, and it's not good at finding non-standard MTUs.

For a "common" MTU, you only have to endure two timeouts (the next
highest common MTU, and the +1 test). For an uncommon MTU you can
increase the MTU to maximum "common" MTU that's lower than your MTU
quickly, and can endure the timeouts from then on.

Note that with a 100 ms timeout (more than enough) you're done inless than 2 seconds worst case.

So when system A tries with 3000 bytes (worked with C!) towards B,B sets an
ack flag and tries with 9216, which fails, so A sends a NAK and tries
with 6108, and so on.

Hang on, if they don't receive a packet, how can they know to send a
NAK?  if they're just waiting for a timeout how can they know if the
packet got lost on the way there or on the way back?


Good question.  :-)

If instead of using special "ICMP MTU Probes" we use "ICMP Echorequest"
/"ICMP Echo Reply" messages, there is no changes to any packet formats
needed, all it needs to be done is have implemented in a TCP/IP stack,
and the concept is even reusable for IPv4. Other hosts don't evenhave
to be upgraded to support this either.  magic!


You mean, rely just on ICMP and not announce a bigger MTU in RAs?

I guess you're right, but I wouldn't want to be a 10 Mbps host in anotherwise 64k jumbo-enabled network, because all those probes wouldeat up my bandwidth even though I can't successfully receive them.

Also, I think we want to be nicer to on-link probers than off-linkones, especially with these large packets.

Stacks would be free to do as you suggest (doing a binary search)or as
I suggest (ramp up and do a binary search only as a last resort).


Yes, this can be left up to the implementers.

So the general approach would be:
* If a packet arrives from a host that is larger than the cachedMTU for
that neighbour, increase it to the size of the packet arriving.

Not sure if we want to do this check for every packet. Also, anattacker could fake the packet in order to do an "MTU attack" on anon-jumbo enabled host.

* When receiving a ND (but not a NS!), and you have no cached MTU for
that neighbour, you start the MTU discovery process (using anymechanismfor selecting the packet sizes the implementation deems appropriate(ie,
either yours, or mine, or if someone can come up with a method thats
even better than ours, they could use that!)


With an MTU option in it. And why not NS?

No, the announcement "the switch can handle 4500 bytes" wouldn't have
anything to do with "I can handle 1500".

Which switch? I live in a flat with 3 other people, we have atleast 4

devices that act like switches on one segment.  (2 switches, a voip
phone (you can daisy chain a PC off it), and an AP).  I have no idea
what the maximum MTU of all those switches are


If all of those switches announce their MTU, we're in business.

On the other hand, if we do an MTU search we don't need thisinformation because we'll find out ourselves.

If we don't do an MTU search and the switches don't announce theirMTU, you're probably not going to use jumboframes on such a network...

It would be even better if we could ask the switch what our port
supports, but I'm not sure how to do this in such a way that a switch
that doesn't support this protocol floods the request so the results
are meaningless.

Hrm, so Ethernet has capability negotiation (which is how speed,duplex,
pause frame support etc is negotiated).  I have no idea if it says if
the switch supports jumbo gram, IEEE specs make my head hurt.

Autonegotiation only does 16 bits or something like that, no room toinclude the MTU there. Gigabit does have some in-band stuff like flowcontrol, maybe that can be reused. But you always run the risk that adumb switch just forwards those packets and screws up the negotiation.


--------------------------------------------------------------------
IETF IPv6 working group mailing list
[email protected]
Administrative Requests: https://www1.ietf.org/mailman/listinfo/ipv6
--------------------------------------------------------------------

Re: jumbo frame of GbE and IPv6 -- A proposal

Reply via email to