On Wed, May 13, 2026 at 1:49 PM David Laight
<[email protected]> wrote:
>
> On Wed, 13 May 2026 09:28:40 -0700
> John Ousterhout <[email protected]> wrote:
>
> > On Wed, May 13, 2026 at 2:07 AM David Laight
> > <[email protected]> wrote:
> > >
> > > On Tue, 12 May 2026 11:19:53 -0700
> > > John Ousterhout <[email protected]> wrote:
> > >
> > > > Consider the following sequence of events:
> > > > * The bottom half of a buffer page is filled with data from
> > > >   packet A. The page has a net reference count (reference count
> > > >   - bias) of 1. The page is returned to the NIC, flipped to
> > > >   use the top half.
> > > > * Before the reference on the page is released, the NIC returns
> > > >   the page with no data in it ('size' is zero in ice_clean_rx_irq).
> > > >   In this case the bias does not get decremented. The page still
> > > >   has a net reference count of 1, so it gets returned to the NIC.
> > > >   However, ice_put_rx_mbuf flipped the page so that the bottom
> > > >   half is active.
> > > > * If the NIC stores another packet in the page before packet A
> > > >   has released its reference, the data in packet A will be
> > > >   overwritten with data from the new packet.
> > > > * Unfortunately zero-length buffers occur frequently: they seem
> > > >   to occur whenever a packet uses every available byte in a
> > > >   buffer, ending precisely at the end of the buffer. When this
> > > >   happens the NIC seems to generate an extra zero-length
> > > >   buffer.
> > > > The fix is for ice_put_rx_mbuf not to flip pages that have a
> > > > size of 0.
> > >
> > > How is this different from packet B (in the top half) being
> > > freed before packet A (in the bottom half)?
> >
> > I'm not sure exactly what you're referring to here. Are you asking
> > about a situation where both halves of the page get filled with packet
> > data and then the second half to be filled is the first to be freed? I
> > believe that the ICE driver abandons a page if both halves are ever
> > occupied simultaneously; the page will be returned to the system once
> > both halves have dropped their references. Thus it doesn't matter
> > which half is freed first.
>
> That is what I was thinking, seems like the logic is over complicated.
>
> If you need to put 4k pages into some kind of iommu rather than 2k buffers
> (to contain 1536 byte ethernet packets) then I'd have thought you'd
> initially put both halves into adjacent tx ring entries.
> If a rx buffer is discarded (eg a zero length fragment or a CRC error,
> or even 'copy break' for short packets) then, as an optimisation,
> you could reuse the buffer for another receive.
> The same could be done if the page is freed by an application.
>
> However it sounds like it doesn't use the 2nd half until the first
> completes - otherwise you'd never 'flip' to make the other half
> active.
>
> Thinks...
> By only putting half of each 4k 'page' into the rx ring the code
> will usually save (expensive) iommu setup in the (probably) normal
> case where the buffers are freed 'reasonably quickly'.
> But that really requires a 'free/with_nic/busy' state for each half
> rather then trying to guess from a reference count.
>
> But if the low-level code is recycling the rx buffer (for any reason)
> it wants to use the same buffer.
>
> The ethernet driver I wrote (a long time ago, early 90s) allocated
> 64k as 128 512byte buffers and did an aligned word-sized copy of
> every receive frame - most frames were in contiguous memory.
> The simplicity of it made up for the cost of the copy, especially
> since that was an iommu system.

I'm not here to defend the logic (and it has been replaced with
something that is probably simpler and more efficient); I'm just
suggesting a bug fix for the stable releases that still have this
logic.

-John-

Reply via email to