O Mon, Jun 30, 2025 at 06:22:11PM +0200, Paul Menzel wrote: > Dear Josh, > > > Am 30.06.25 um 18:08 schrieb Hay, Joshua A: > > > > Am 25.06.25 um 18:11 schrieb Joshua Hay: > > > > This series fixes a stability issue in the flow scheduling Tx send/clean > > > > path that results in a Tx timeout. > > > > > > > > The existing guardrails in the Tx path were not sufficient to prevent > > > > the driver from reusing completion tags that were still in flight (held > > > > by the HW). This collision would cause the driver to erroneously clean > > > > the wrong packet thus leaving the descriptor ring in a bad state. > > > > > > > > The main point of this refactor is replace the flow scheduling buffer > > > > > > … to replace …? > > > > Thanks, will fix in v2 > > > > > > ring with a large pool/array of buffers. The completion tag then simply > > > > is the index into this array. The driver tracks the free tags and pulls > > > > the next free one from a refillq. The cleaning routines simply use the > > > > completion tag from the completion descriptor to index into the array to > > > > quickly find the buffers to clean. > > > > > > > > All of the code to support the refactor is added first to ensure traffic > > > > still passes with each patch. The final patch then removes all of the > > > > obsolete stashing code. > > > > > > Do you have reproducers for the issue? > > > > This issue cannot be reproduced without the customer specific device > > configuration, but it can impact any traffic once in place. > > Interesting. Then it’d be great if you could describe that setup in more > detail. >
Hey Paul, The hardware can process packets and return completions out of order; this depends on HW configuration that is difficult to replicate. To match completions with packets, each packet with pending completions must be associated to a unique ID. The previous code would occasionally reassigned the same ID to multiple pending packets, resulting in resource leaks and eventually panics. The new code uses a much simpler data structure to assign IDs that is immune to duplicate assignment, and also much more efficient at runtime. > > > > Joshua Hay (5): > > > > idpf: add support for Tx refillqs in flow scheduling mode > > > > idpf: improve when to set RE bit logic > > > > idpf: replace flow scheduling buffer ring with buffer pool > > > > idpf: stop Tx if there are insufficient buffer resources > > > > idpf: remove obsolete stashing code > > > > > > > > .../ethernet/intel/idpf/idpf_singleq_txrx.c | 6 +- > > > > drivers/net/ethernet/intel/idpf/idpf_txrx.c | 626 > > > > ++++++------------ > > > > drivers/net/ethernet/intel/idpf/idpf_txrx.h | 76 +-- > > > > 3 files changed, 239 insertions(+), 469 deletions(-) > > > Kind regards, > > Paul