Andrew Gallatin wrote:
> Garrett D'Amore writes:
>  > Andrew Gallatin wrote:
>  > > Garrett D'Amore writes:
>  > >  > Andrew Gallatin wrote:
>  > >  > > Garrett D'Amore writes:
>  > >  > >  > The problem here is that the only reason to lower the MTU is to 
> deal 
>  > >  > >  > with cases where Path MTU discovery fails.  For example, 
> lowering the 
>  > >  > >  > MTU because your upstream provider doesn't properly deal with 
> frames 
>  > >  > >  > larger than a PPP size or somesuch.
>  > >  > >  > 
>  > >  > >  > Its frustrating that these cases still exist, but they do.  In 
> general, 
>  > >  > >  > I agree, that lowering the MTU should not be necessary.  And 
> indeed, 
>  > >  > >  > frankly nobody should need to touch the values provided by the 
> media 
>  > >  > >  > drivers when everything works properly.
>  > >  > >
>  > >  > > You may want to touch the values in order to reduce memory useage if
>  > >  > > you know you cannot use jubmo frames.  Since most drivers manage 
> their
>  > >  > > own receive buffers, this can add up.  For example, my 10GbE driver,
>  > >  > > depending on load, may allocate up to a (tunable) maximum of 4096
>  > >  > > receive buffers.  The difference between 4096 1500b and 9000b frames
>  > >  > > is nearly 30MB.
>  > >  > >
>  > >  > > It would be nice if the driver could be notified that the MTU is
>  > >  > > changing so that it can re-allocate appropriately sized receive
>  > >  > > buffers.  Every other *nix that I've worked with does this.
>  > >  > >   
>  > >  > 
>  > >  > Okay, fair enough. :-)
>  > >  > 
>  > >  > Btw, I am *hopeful* that one day in the future Nemo will provide 
> buffer 
>  > >  > management on behalf of drivers.  This will address some of the 
>  > >  > long-standing races with "loan-up", and free drivers from making poor 
>  > >  > decisions as to when to bcopy or use loan up.  (Or maybe just 
> allocate a 
>  > >  > new DMA or DVMA buffer....)
>  > >
>  > > Or maybe just fix the IOMMU problem..
>  > >
>  > > The main reason drivers have to do any of this loaning or bcopying
>  > > nonsense is because translating a kernel virtual to a DMA address on
>  > > IOMMU infected systems is so horribly expensive.  The one (only?)
>  > > thing MacOSX got right in its network buffer management is that it
>  > > pre-enters all network buffers into the IOMMU(s), so that obtaining a
>  > > DMA address is a just a simple table lookup, without any hardware
>  > > interaction.  
>  > >   
>  > 
>  > But some Sun drivers do this as well... hence dvma_reserve().
>  > 
>  > The problem, as I understand it, is that even this requires buffers to 
>  > be reused.  For packets that are loaned up in the stack, there is no 
>  > guarantee that they will be returned in a timely fashion to the driver.  
>  > So we still wind up seeing the cost of bcopy come up from time to time.
>
> What I'm proposing, and I may be all wet, is making allocb() do the
> equivalent of dvma_reserve for all the memory it manages.  This would
> have the advantage of avoiding IOMMU overheads on the transmit side as
> well.
>   

Hmm... maybe a special version of allocb()?  (Using it for all 
allocb()'s would be a terrible idea, IMO.  Because mblks are used all 
over the system, and a lot of them don't ever touch hardware.  E.g. for 
DLPI control messages.)

It certainly bears some more consideration.

>  > Of course, in general, the stack does return large buffers back to 
>  > userland ... it is most likely to "hang on" to smaller packets, which 
>  > may be better served by a bcopy anyway.
>
> In general, but you can always contrive a special case where
> you've got a ton of non-consuming sockets with large socket
> buffer sizes..
>   

True.

    -- Garrett
> Drew
>   


Reply via email to