Re: [Cerowrt-devel] Baby jumbo frames support?

dpreed Thu, 21 Jun 2012 07:25:40 -0700

I understand Dave Taht's long lecture - actually understood it years ago.  But 
frame aggregation is not the same thing as jumbo frames in a multi-technology 
Ethernet LAN.   Jumbo frames provide a way to exploit *end-to-end* frame sizes 
greater than 1500 bytes.  That means the source and destination TCPs get frames 
that are "whole" (and not random subassemblies of frames that may arrive close 
together in time).
 
9000 byte frames were invented for 1 GigE transports.   Today's 802.11n and 
futures approach 1 GigE, and 1 GigE is the standard wiring for most homes, etc. 
  It does not matter how the underlying radio links chop up the Ethernet frame, 
retransmit them, ack them etc.   The value I am disucssing is at the 
*endpoints*.
 
It's tempting for transport link providers to *ignore* TCP and so forth when 
they design their transports, and focus only on transport-level efficiencies 
and reliabilities.  This temptation created bufferbloat and also the excessive 
retry problem.   (and in the past it created the historical predecessor of 
"bufferbloat" - Frame Relay's "Reliable delivery mode" which would go to 
extraordinary lengths to never drop a packet, including storing the packets *on 
disk* in some cases - talk about bloated buffers!)
 
The conversation here (including, but not limited to Taht's comments) shows 
exactly that *temptation*.
 
Aggregation is NOT the same as large frames.  Not at all.  It achieves internal 
efficiencies, but not the endpoint efficiencies of receiving a coherent frame, 
that can be processed immediately and by a single code path.  At 1 Gigabit/sec 
this was important enough to introduce such frame sizes.
 
The alternative ways to achieve the endpoint goals would be to allow reordering 
of data delivery to the endpoint app, perhaps by making SCTP work instead of 
TCP, using a flow/congestion/rate control mechanism other than a window on 
sequence numbers, etc.  But that would mean changing the entire stack to a new 
end-to-end theory of operation.
 
There is a real tradeoff space, but unilaterally declaring that packet 
aggregation is the same as jumbo Ethernet frames is choosing a poor point in 
the tradeoff space.
 
Regarding "header overhead" - that is minor in the scheme of things.  Obsessing 
about that indicates a lack of perspective on the systems level issues.
 
 
 
 
-----Original Message-----
From: "Robert Bradley" <[email protected]>
Sent: Thursday, June 21, 2012 9:33am
To: [email protected]
Subject: Re: [Cerowrt-devel] Baby jumbo frames support?




On 21 June 2012 01:58, Dave Taht <[email protected]> wrote:
> As for PPoE with a size 1508... um... one or the other device is going
> to get in your way here. I presume that 1500 works? You would do
> better to contact the author of the driver (juhosg) to get your
> question answered as I'm under the impression he is under the right
> NDAs.
>

I think the point here is that MTU=1500 works, but once you add in the
PPPoE header, you end up with an effective MTU of 1492 for outbound
packets:

http://aa.net.uk/kb-broadband-mtu.html
http://tools.ietf.org/html/rfc4638

The short answer is that without baby-jumbo support, you either end up
fragmenting packets or you need to somehow restrict the MTU manually.
You can do that either through MSS clamping or simply configuring each
internal machine to use MTU=1492.  To get around this, the BT ADSL
modems started to support MTU=1508.  This means that the MTU within
the PPPoE tunnel remains at Ethernet-standard 1500, and avoids the
fragmentation or reconfiguration issues.

As for supporting it in CeroWRT ... the ag71xx driver defines
AG71XX_TX_MTU_LEN=1540, so it looks safe enough to use MTU 1508,
especially if you know that no vlans or other additions to the
standard header will be used.  To enable that, you need to reimplement
the eth_change_mtu function for the driver.  The current code uses the
kernel's implementation, which restricts the MTU to 1500.  An initial,
naive patch would look something like:

----
--- C:/Users/robert/AppData/Local/Temp/ag71x-revBASE.svn000.tmp.c       Mon
May 28 03:55:59 2012
+++ C:/Users/robert/Desktop/ag71xx/ag71xx_main.c        Thu Jun 21 13:58:44 2012
@@ -1042,13 +1042,25 @@
 }
 #endif

+/*
+ * Copied from eth_change_mtu and modified so that baby jumbo packets
+ * may be used.  This has not been tested!
+ */
+int ag71xx_change_mtu(struct net_device *dev, int new_mtu)
+{
+        if (new_mtu < 68 || new_mtu > (ETH_DATA_LEN + 8))
+                return -EINVAL;
+        dev->mtu = new_mtu;
+        return 0;
+}
+
 static const struct net_device_ops ag71xx_netdev_ops = {
 .ndo_open              = ag71xx_open,
 .ndo_stop              = ag71xx_stop,
 .ndo_start_xmit                = ag71xx_hard_start_xmit,
 .ndo_do_ioctl          = ag71xx_do_ioctl,
 .ndo_tx_timeout                = ag71xx_tx_timeout,
-       .ndo_change_mtu         = eth_change_mtu,
+       .ndo_change_mtu         = ag71xx_change_mtu,
 .ndo_set_mac_address   = eth_mac_addr,
 .ndo_validate_addr     = eth_validate_addr,
 #ifdef CONFIG_NET_POLL_CONTROLLER

----

where I've copied the original function and changed the upper limit to
ETH_DATA_LEN+8, then set up the netdev_ops structure to call the new
version.  In reality, you probably want to add some better checks
(testing for MTU+all possible headers<1540?) and remove the magic
constant - in the worst case, something closer to the e1000 driver's
implementation.  I wouldn't recommend using the present version on
anything other than an experimental build, but the default MTU would
be 1500 anyway so should avoid causing too much damage.  Those on BT
ADSL lines can change the MTU on ge00 themselves and see what breaks.
-- 
Robert Bradley
_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Re: [Cerowrt-devel] Baby jumbo frames support?

Reply via email to