RE: [PATCH] net: increase the maximum of RX/TX descriptors

Morten Brørup Tue, 05 Nov 2024 00:49:45 -0800

> From: Stephen Hemminger [mailto:step...@networkplumber.org]
> Sent: Wednesday, 30 October 2024 17.07
> 
> On Wed, 30 Oct 2024 16:40:10 +0100
> Lukáš Šišmiš <sis...@cesnet.cz> wrote:
> 
> > On 30. 10. 24 16:20, Stephen Hemminger wrote:
> > > On Wed, 30 Oct 2024 14:58:40 +0100
> > > Lukáš Šišmiš <sis...@cesnet.cz> wrote:
> > >
> > >> On 29. 10. 24 15:37, Morten Brørup wrote:
> > >>>> From: Lukas Sismis [mailto:sis...@cesnet.cz]
> > >>>> Sent: Tuesday, 29 October 2024 13.49
> > >>>>
> > >>>> Intel PMDs are capped by default to only 4096 RX/TX descriptors.
> > >>>> This can be limiting for applications requiring a bigger buffer
> > >>>> capabilities. The cap prevented the applications to configure
> > >>>> more descriptors. By bufferring more packets with RX/TX
> > >>>> descriptors, the applications can better handle the processing
> > >>>> peaks.
> > >>>>
> > >>>> Signed-off-by: Lukas Sismis <sis...@cesnet.cz>
> > >>>> ---
> > >>> Seems like a good idea.
> > >>>
> > >>> Have the max number of descriptors been checked with the
> datasheets for all the affected NIC chips?
> > >>>
> > >> I was hoping to get some feedback on this from the Intel folks.
> > >>
> > >> But it seems like I can change it only for ixgbe (82599) to 32k
> > >> (possibly to 64k - 8), others - ice (E810) and i40e (X710) are
> capped at
> > >> 8k - 32.
> > >>
> > >> I neither have any experience with other drivers nor I have them
> > >> available to test so I will let it be in the follow-up version of
> this
> > >> patch.
> > >>
> > >> Lukas
> > >>
> > > Having large number of descriptors especially at lower speeds will
> > > increase buffer bloat. For real life applications, do not want
> increase
> > > latency more than 1ms.
> > >
> > > 10 Gbps has 7.62Gbps of effective bandwidth due to overhead.
> > > Rate for 1500 MTU is 7.62Gbs / (1500 * 8) = 635 K pps (i.e 1.5 us
> per packet)
> > > A ring of 4096 descriptors can take 6 ms for full size packets.
> > >
> > > Be careful, optimizing for 64 byte benchmarks can be disaster in
> real world.
> > >
> > Thanks for the info Stephen, however I am not trying to optimize for
> 64
> > byte benchmarks. The work has been initiated by an IO problem and
> Intel
> > NICs. Suricata IDS worker (1 core per queue) received a burst of
> packets
> > and then sequentially processes them one by one. Well it seems like
> > having a 4k buffers it seems to not be enough. NVIDIA NICs allow e.g.
> > 32k descriptors and it works fine. In the end it worked fine when
> ixgbe
> > descriptors were increased as well. I am not sure why AF-Packet can
> > handle this much better than DPDK, AFP doesn't have crazy high number
> of
> > descriptors configured <= 4096, yet it works better. At the moment I
> > assume there is an internal buffering in the kernel which allows to
> > handle processing spikes.
> >
> > To give more context here is the forum discussion -
> > https://forum.suricata.io/t/high-packet-drop-rate-with-dpdk-compared-
> to-af-packet-in-suricata-7-0-7/4896
> >
> >
> >
> 
> I suspect AF_PACKET provides an intermediate step which can buffer more
> or spread out the work.


Agree. It's a Linux scheduling issue.

With DPDK polling, there is no interrupt in the kernel scheduler.
If the CPU core running the DPDK polling thread is running some other thread 
when the packets arrive on the hardware, the DPDK polling thread is NOT 
scheduled immediately, but has to wait for the kernel scheduler to switch to 
this thread instead of the other thread.
Quite a lot of time can pass before this happens - the kernel scheduler does 
not know that the DPDK polling thread has urgent work pending.
And the number of RX descriptors needs to be big enough to absorb all packets 
arriving during the scheduling delay.
It is not well described how to *guarantee* that nothing but the DPDK polling 
thread runs on a dedicated CPU core.


With AF_PACKET, the hardware generates an interrupt, and the kernel immediately 
calls the driver's interrupt handler - regardless what the CPU core is 
currently doing.
The driver's interrupt handler acknowledges the interrupt to the hardware and 
informs the kernel that the softirq handler is pending.
AFAIU, the kernel executes pending softirq handlers immediately after returning 
from an interrupt handler - regardless what the CPU core was doing the 
interrupt occurred.
The softirq handler then dequeues the packets from the hardware RX descriptors 
into SKBs, and when all of them have been dequeued from the hardware, enables 
interrupts. Then the CPU core resumes the work it was doing when interrupted.

RE: [PATCH] net: increase the maximum of RX/TX descriptors

Reply via email to