On Tue, 13 Jan 2026 15:13:24 +0000 Haiyang Zhang wrote: > > > I get that. What is the logic for combining 4 packets into a single > > > completion? How does it work? Your commit message mentions "regression > > > on latency" - what is the bound on that regression? > > > > When we received CQE type CQE_RX_COALESCED_4, it's a coalesced CQE. And in > > the CQE OOB, there is an array with 4 PPI elements, with each pkt's length: > > oob->ppi[i].pkt_len. > > > > So we read the related WQE and the DMA buffers for the RX pkt payloads, up > > to 4. > > But, if the coalesced pkts <4, the pkt_len will be 0 after the last pkt, > > so we know when to stop reading the WQEs. > > And, the coalescing can add up to 2 microseconds into one-way latency.
I am asking you how the _device_ (hypervisor?) decides when to coalesce and when to send a partial CQE (<4 packets in 4 pkt CQE). You are using the coalescing uAPI, so I'm trying to make sure this is the correct API. CQE configuration can also be done via ringparam.
