Re: [Intel-wired-lan] [PATCH net v3] ice: fix packet corruption due to extraneous page flip

Jacob Keller Thu, 14 May 2026 09:43:59 -0700

On 5/14/2026 3:01 AM, David Laight wrote:
> On Wed, 13 May 2026 21:47:11 -0700
> John Ousterhout <[email protected]> wrote:
> 
>> On Wed, May 13, 2026 at 1:49 PM David Laight
>> <[email protected]> wrote:
>>>
>>> On Wed, 13 May 2026 09:28:40 -0700
>>> John Ousterhout <[email protected]> wrote:
>>>  
>>>> On Wed, May 13, 2026 at 2:07 AM David Laight
>>>> <[email protected]> wrote:  
>>>>>
>>>>> On Tue, 12 May 2026 11:19:53 -0700
>>>>> John Ousterhout <[email protected]> wrote:
>>>>>  
>>>>>> Consider the following sequence of events:
>>>>>> * The bottom half of a buffer page is filled with data from
>>>>>>   packet A. The page has a net reference count (reference count
>>>>>>   - bias) of 1. The page is returned to the NIC, flipped to
>>>>>>   use the top half.
>>>>>> * Before the reference on the page is released, the NIC returns
>>>>>>   the page with no data in it ('size' is zero in ice_clean_rx_irq).
>>>>>>   In this case the bias does not get decremented. The page still
>>>>>>   has a net reference count of 1, so it gets returned to the NIC.
>>>>>>   However, ice_put_rx_mbuf flipped the page so that the bottom
>>>>>>   half is active.
>>>>>> * If the NIC stores another packet in the page before packet A
>>>>>>   has released its reference, the data in packet A will be
>>>>>>   overwritten with data from the new packet.
>>>>>> * Unfortunately zero-length buffers occur frequently: they seem
>>>>>>   to occur whenever a packet uses every available byte in a
>>>>>>   buffer, ending precisely at the end of the buffer. When this
>>>>>>   happens the NIC seems to generate an extra zero-length
>>>>>>   buffer.
>>>>>> The fix is for ice_put_rx_mbuf not to flip pages that have a
>>>>>> size of 0.  
>>>>>
>>>>> How is this different from packet B (in the top half) being
>>>>> freed before packet A (in the bottom half)?  
>>>>
>>>> I'm not sure exactly what you're referring to here. Are you asking
>>>> about a situation where both halves of the page get filled with packet
>>>> data and then the second half to be filled is the first to be freed? I
>>>> believe that the ICE driver abandons a page if both halves are ever
>>>> occupied simultaneously; the page will be returned to the system once
>>>> both halves have dropped their references. Thus it doesn't matter
>>>> which half is freed first.  
>>>
>>> That is what I was thinking, seems like the logic is over complicated.
>>>
>>> If you need to put 4k pages into some kind of iommu rather than 2k buffers
>>> (to contain 1536 byte ethernet packets) then I'd have thought you'd
>>> initially put both halves into adjacent tx ring entries.
>>> If a rx buffer is discarded (eg a zero length fragment or a CRC error,
>>> or even 'copy break' for short packets) then, as an optimisation,
>>> you could reuse the buffer for another receive.
>>> The same could be done if the page is freed by an application.
>>>
>>> However it sounds like it doesn't use the 2nd half until the first
>>> completes - otherwise you'd never 'flip' to make the other half
>>> active.
>>>
>>> Thinks...
>>> By only putting half of each 4k 'page' into the rx ring the code
>>> will usually save (expensive) iommu setup in the (probably) normal
>>> case where the buffers are freed 'reasonably quickly'.
>>> But that really requires a 'free/with_nic/busy' state for each half
>>> rather then trying to guess from a reference count.
>>>
>>> But if the low-level code is recycling the rx buffer (for any reason)
>>> it wants to use the same buffer.
>>>
>>> The ethernet driver I wrote (a long time ago, early 90s) allocated
>>> 64k as 128 512byte buffers and did an aligned word-sized copy of
>>> every receive frame - most frames were in contiguous memory.
>>> The simplicity of it made up for the cost of the copy, especially
>>> since that was an iommu system.  
>>
>> I'm not here to defend the logic (and it has been replaced with
>> something that is probably simpler and more efficient); I'm just
>> suggesting a bug fix for the stable releases that still have this
>> logic.


Right. We definitely want a fix for the possible data corruption in
stable. Ideally one as simple as possible.

> 
> You've forced me to look at all of the function :-)
> I've noticed a few things:
> - If ice_add_xdp_frag() fails (because there are too many fragments)
>   then the rest of the fragments are left in the tx ring (instead
>   of being discarded) - so are likely to be treated as a full packet
>   later on.
> - Frames with status errors (crc, framing etc) are discarded after
>   the skb is built - surely that should happen before the xdp 'program'
>   is called.
> - If the remote system send a very very long frame (traditionally the PHY's
>   'jabber detect' didn't always work) you can end up with all of the rx
>   ring being full of a single partial packet.
> 
> I think you need to avoid calling ice_add_xdp_frag() when 'size == 0'.
> Then in ice_put_rx_mbuf() unconditionally call ice_put_rx_buf() for
> zero length fragments.
> The comment would be 'zero length fragments can always be reused'.
> 

That seems correct.

> The zero length fragments almost certainly exist because the mac hardware
> advances the the new buffer expecting more data - but only gets the 
> 4 byte CRC. So the zero length buffer contains the receive status.
> 

That matches my understanding.

Re: [Intel-wired-lan] [PATCH net v3] ice: fix packet corruption due to extraneous page flip

Reply via email to