Sebastien Roy wrote:
> Garrett D'Amore wrote:
>>> For the mtu case: the MTU definitiion targetted in the doc was intended
>>> to address the driver's default_mtu (the tunable typically tweaked
>>> by setting various params in the driver.conf to get Jumbo Frames).
>>> As you point out, there is a related concept of the GLDv3 max SDU which
>>> can be lowered without involvement from the driver. We could split the
>>> mtu into two items, the "max_mtu" (which would be the classical
>>> ethernet driver's "default_mtu") and the "max_sdu" which would be
>>> recognized and handled in the mac layer itself, without attempting to
>>> invoke the mc_*prop functions. Would this make sense?
>>
>> Lowering the MTU is done at the IP layer.
>
> That's a bug IMO. IP should just use the MTU reported by the layer
> underneath. Any modification of MTU needs to be done from the bottom
> of the stack, not the top.
The problem here is that the only reason to lower the MTU is to deal
with cases where Path MTU discovery fails. For example, lowering the
MTU because your upstream provider doesn't properly deal with frames
larger than a PPP size or somesuch.
Its frustrating that these cases still exist, but they do. In general,
I agree, that lowering the MTU should not be necessary. And indeed,
frankly nobody should need to touch the values provided by the media
drivers when everything works properly.
>
>> There is little reason to change the SDU size reported by the
>> driver. dladm shouldn't concern itself with that.
>
> I don't know exactly what you're trying to say. Are you saying that
> the driver should report different values for its max SDU in
> mac_register_t and its "default MTU" property? I don't see what the
> point of that would be. They are effectively the same thing...
Sorry, I think I got neurons crossed when writing this. What I should
have said, IIRC, was that there was little reason to for dladm to
support lowering the maximum SDU in the cases where IP is already being
used to do that (e.g. to cope better with PPP, etc.) Basically, the
only reason to support changing the default_mtu via dladm is to cope
with differences in ethernet drivers to accommodate jumbo frames.
Frankly the situation with jumbo frames right now really ticks me off.
The fact that this is not discoverable, and that drivers often have to
go to extreme efforts to cope with it, is really obnoxious. But fixing
that is requires bigger effort and involvement of all the ethernet
players. (It would be nice if I could poll at the 802.3 layer whether
my link partner supports jumbo frames, and then, once I've discovered
that it does, probe via ICMP or somesuch any potential destination on
the same subnet. Then we could do away with this whole bit of tuning,
and just let everything auto-adjust for maximum performance everywhere.
>
>> The driver's default_mtu is really the value reported via the maximum
>> SDU in Nemo. I'd just leave it "default_mtu" for now.
>
> Right...
>
>> Do we have a way for the upper layers of the stack (IP, TCP, VLAN,
>> aggr) to react to a change in the max_sdu? I don't think so.
>
> Yes. DL_NOTE_SDU_SIZE. It works great with IP tunnels, and I have it
> working for Nemo in the Clearview IP tunneling device driver. It's
> not at all complex as it only required a few lines of code in Nemo to
> get working, and an additional MAC entry point to have the driver tell
> the MAC layer that the max SDU has changed.
>
> I really don't see a reason to restrict changing the MTU to stopped
> MACs, as it's quite trivial to get working.
Okay, well if IP and the upper layers can cope, great! I wasn't aware
of DL_NOTE_SDU_SIZE. A lot of features have been added to IP since
DL_CAPABILITY_REQ was added, I guess. :-)
Will VLANs and aggrs cope with this DL_NOTE_SDSU_SIZE? I suppose that
there is a mac_update_xxx of some form for this, as well?
>
>> Its likely that changing this tunable will require replumbing various
>> layers. And, quite frankly, that's probably okay, because when
>> switching to a larger MTU it requires other administrative changes as
>> well. (I.e. you have to make sure all other hosts on the same subnet
>> *also* can cope with large frames.)
>
> Synchronizing the increase of MTU of interfaces of various hosts on
> the network is not related. I don't see why having the following
> steps on 8 different hosts:
>
> ifconfig bge0 unplumb
> dladm set-linkprop default_mtu=9000 bge0
> ifconfig bge0 plumb dhcp
>
> Is any better than the following step on those same 8 hosts:
>
> dladm set-linkprop default_mtu=9000 bge0
>
> You cannot atomically plumb every interface on the network, so there's
> going to be some period of time when all of the interfaces are not in
> sync...
What I meant was, it didn't make too much sense to burn a whole lot of
cycles trying to fix all the drivers to do this dynamically, if IP
needed changing, particularly since a bunch of manual changes to touch
all the other hosts were probably required anyway. But that's moot
since the main work in IP is already done apparently.
>
>> Engineering a solution around this is likely to be "non-trivial".
>
> It is quite trivial. It took maybe 15 minutes to implement with a few
> dozen lines of code in Nemo. IP can already handle max SDU changes
> dynamically, so no changes were needed at that layer.
It was IP that I was concerned about. If its already done, then great!
>
>> So it may be simpler to just have a way for brussels to report back
>> to the user that the change won't take effect until the upper layers
>> are replumbed.
>
> I disagree.
Given the information you've just supplied me, I concur.
-- Garrett
>
> -Seb