Andrew Gallatin wrote:
> Garrett D'Amore writes:
> > Andrew Gallatin wrote:
> > > Garrett D'Amore writes:
> > > > Andrew Gallatin wrote:
> > > > > Garrett D'Amore writes:
> > > > > > The problem here is that the only reason to lower the MTU is to
> deal
> > > > > > with cases where Path MTU discovery fails. For example,
> lowering the
> > > > > > MTU because your upstream provider doesn't properly deal with
> frames
> > > > > > larger than a PPP size or somesuch.
> > > > > >
> > > > > > Its frustrating that these cases still exist, but they do. In
> general,
> > > > > > I agree, that lowering the MTU should not be necessary. And
> indeed,
> > > > > > frankly nobody should need to touch the values provided by the
> media
> > > > > > drivers when everything works properly.
> > > > >
> > > > > You may want to touch the values in order to reduce memory useage if
> > > > > you know you cannot use jubmo frames. Since most drivers manage
> their
> > > > > own receive buffers, this can add up. For example, my 10GbE driver,
> > > > > depending on load, may allocate up to a (tunable) maximum of 4096
> > > > > receive buffers. The difference between 4096 1500b and 9000b frames
> > > > > is nearly 30MB.
> > > > >
> > > > > It would be nice if the driver could be notified that the MTU is
> > > > > changing so that it can re-allocate appropriately sized receive
> > > > > buffers. Every other *nix that I've worked with does this.
> > > > >
> > > >
> > > > Okay, fair enough. :-)
> > > >
> > > > Btw, I am *hopeful* that one day in the future Nemo will provide
> buffer
> > > > management on behalf of drivers. This will address some of the
> > > > long-standing races with "loan-up", and free drivers from making poor
> > > > decisions as to when to bcopy or use loan up. (Or maybe just
> allocate a
> > > > new DMA or DVMA buffer....)
> > >
> > > Or maybe just fix the IOMMU problem..
> > >
> > > The main reason drivers have to do any of this loaning or bcopying
> > > nonsense is because translating a kernel virtual to a DMA address on
> > > IOMMU infected systems is so horribly expensive. The one (only?)
> > > thing MacOSX got right in its network buffer management is that it
> > > pre-enters all network buffers into the IOMMU(s), so that obtaining a
> > > DMA address is a just a simple table lookup, without any hardware
> > > interaction.
> > >
> >
> > But some Sun drivers do this as well... hence dvma_reserve().
> >
> > The problem, as I understand it, is that even this requires buffers to
> > be reused. For packets that are loaned up in the stack, there is no
> > guarantee that they will be returned in a timely fashion to the driver.
> > So we still wind up seeing the cost of bcopy come up from time to time.
>
> What I'm proposing, and I may be all wet, is making allocb() do the
> equivalent of dvma_reserve for all the memory it manages. This would
> have the advantage of avoiding IOMMU overheads on the transmit side as
> well.
>
Hmm... maybe a special version of allocb()? (Using it for all
allocb()'s would be a terrible idea, IMO. Because mblks are used all
over the system, and a lot of them don't ever touch hardware. E.g. for
DLPI control messages.)
It certainly bears some more consideration.
> > Of course, in general, the stack does return large buffers back to
> > userland ... it is most likely to "hang on" to smaller packets, which
> > may be better served by a bcopy anyway.
>
> In general, but you can always contrive a special case where
> you've got a ton of non-consuming sockets with large socket
> buffer sizes..
>
True.
-- Garrett
> Drew
>