I have another theory/concern, which may or may not be relevant.

Many NICs have to configure the maximum frame size in such a way that if larger frames are enabled, and if there is a performance penalty, *all* frames will pay that penalty.

This could create problems such that Nico's proposed adaptive tuning might not actually *eliminate* the performance bottleneck that Mike was trying to eliminate for the problematic nxge devices in question.

Put another way -- you might not be able to achieve any benefit with a smaller frame size *unless* you are willing to also compromise your *ability* to receive and handle larger frames.

I'd like to know a lot more about the performance bottleneck that nxge was hoping to overcome. Is it sensitive to direction? Will just sending (or receiving) smaller packets avoid the bottleneck, or does the NIC need to be configured in a way that excludes the possibility of handling larger frames? What implications are there for LSO or LRO offload, if any?

All of this suggests (rather strongly) to me that the nxge perf. bottleneck is a serious hardware problem, and *that* information needs to be communicated (if not already done so) to the nxge team, and we need to be working the problem from that end. This is not, IMO, the sort of problem that can be "fixed" by software, but merely "worked around" instead. And it seems so terribly device specific that I hate to re-engineer our stack just to accommodate a deficient piece of hardware.

Do other hardware devices suffer from the same problem? Is the problem fundamental to the architecture, or just to the nxge device? (And does the problem affect all nxge devices, or only those in certain slots/configurations?)

   -- Garrett

James Carlson wrote:
Moore, Joe writes:
James Carlson wrote:
That's true, except that this isn't so simple.  The "optimal" MTU to
use also has to take into account the attached hardware and the peers.
At least for Ethernet, the MTU must be set the same on all systems on
a given subnetwork.  That's inescapable.
You're saying that the MTU must be the same for everythin on the subnet?  I'm 
not a network guru, but it seems to me that there's some wiggle room in there.

Not for IEEE 802.

All listeners on the network must accept a packet up to the MTU of the system 
sending them data.  They don't care about anything else happening on the 
network.

That's true.  You can usually configure your own IP-layer MTU (or
better yet a transport layer segment size) downwards and send smaller
packets if that's what you want to do.

Setting the MAC-layer MTU differently is hazardous in several cases.
If you do any bridging, it's obviously a non-starter; bridge links
must have an MTU at least as large as the largest packet you'll ever
see, or you'll have black holes in your network.

If you have interfaces that treat MTU as though it were MRU as well
(an apparently common situation), then setting the MTU smaller on the
physical (MAC-layer) interface will produce exactly the sort of broken
behavior that you were excluding.  Thus, it's something to be careful
about and that depends on internal (and usually undocumented) device
driver design issues.

If you use routing protocols such as IS-IS that depend on the MAC's
MTU, you may end up with surprising results as well.

If there is a driver- and hardware- optimized "Max Transmit TU" and the 
subnetwork has a different (but bigger) MTU, wouldn't it make sense to split out those 
two tunables?  Default MTTU == MTU, but can be tweaked at the driver layer (for example 
in driver.conf)

Or would that be too many network tunables?

I think it is too many.  And worse, it's just too vague.

Optimal in what sense and for whom?  Is it really true that "all"
applications benefit from using exactly that size, or do only
"certain" applications benefit, and if so, which ones?  Does it matter
what is "optimal" for the peer you're talking to, or are local DMA
optimizations the only things that matter in the world?  Does
"optimal" perhaps depend on other factors, such as the use (or
non-use) of IP options, v4 versus v6, or other offload-hampering
issues?

One possibility is having the driver export properties that various
applications and/or transport layers can read and determine what
they'll do to optimize their behavior.  That way, it's not wired into
something as side-effect laden and hard to get right as MTU, and it's
presented in a way that allows us to do the Right Thing over time
(which I think is to adopt the adaptive behavior that Nico described).


_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to