On 9/20/2023 2:12 PM, Ferruh Yigit wrote: > On 8/24/2023 8:36 AM, Feifei Wang wrote: >> Currently, the transmit side frees the buffers into the lcore cache and >> the receive side allocates buffers from the lcore cache. The transmit >> side typically frees 32 buffers resulting in 32*8=256B of stores to >> lcore cache. The receive side allocates 32 buffers and stores them in >> the receive side software ring, resulting in 32*8=256B of stores and >> 256B of load from the lcore cache. >> >> This patch proposes a mechanism to avoid freeing to/allocating from >> the lcore cache. i.e. the receive side will free the buffers from >> transmit side directly into its software ring. This will avoid the 256B >> of loads and stores introduced by the lcore cache. It also frees up the >> cache lines used by the lcore cache. And we can call this mode as mbufs >> recycle mode. >> >> In the latest version, mbufs recycle mode is packaged as a separate API. >> This allows for the users to change rxq/txq pairing in real time in data >> plane, >> according to the analysis of the packet flow by the application, for example: >> ----------------------------------------------------------------------- >> Step 1: upper application analyse the flow direction >> Step 2: recycle_rxq_info = rte_eth_recycle_rx_queue_info_get(rx_portid, >> rx_queueid) >> Step 3: rte_eth_recycle_mbufs(rx_portid, rx_queueid, tx_portid, tx_queueid, >> recycle_rxq_info); >> Step 4: rte_eth_rx_burst(rx_portid,rx_queueid); >> Step 5: rte_eth_tx_burst(tx_portid,tx_queueid); >> ----------------------------------------------------------------------- >> Above can support user to change rxq/txq pairing at run-time and user does >> not need to >> know the direction of flow in advance. This can effectively expand mbufs >> recycle mode's >> use scenarios. >> >> Furthermore, mbufs recycle mode is no longer limited to the same pmd, >> it can support moving mbufs between different vendor pmds, even can put the >> mbufs >> anywhere into your Rx mbuf ring as long as the address of the mbuf ring can >> be provided. >> In the latest version, we enable mbufs recycle mode in i40e pmd and ixgbe >> pmd, and also try to >> use i40e driver in Rx, ixgbe driver in Tx, and then achieve 7-9% performance >> improvement >> by mbufs recycle mode. >> >> Difference between mbuf recycle, ZC API used in mempool and general path >> For general path: >> Rx: 32 pkts memcpy from mempool cache to rx_sw_ring >> Tx: 32 pkts memcpy from tx_sw_ring to temporary variable + >> 32 pkts memcpy from temporary variable to mempool cache >> For ZC API used in mempool: >> Rx: 32 pkts memcpy from mempool cache to rx_sw_ring >> Tx: 32 pkts memcpy from tx_sw_ring to zero-copy mempool cache >> Refer link: >> http://patches.dpdk.org/project/dpdk/patch/[email protected]/ >> For mbufs recycle: >> Rx/Tx: 32 pkts memcpy from tx_sw_ring to rx_sw_ring >> Thus we can see in the one loop, compared to general path, mbufs recycle >> mode reduces 32+32=64 pkts memcpy; >> Compared to ZC API used in mempool, we can see mbufs recycle mode reduce 32 >> pkts memcpy in each loop. >> So, mbufs recycle has its own benefits. >> >> Testing status: >> (1) dpdk l3fwd test with multiple drivers: >> port 0: 82599 NIC port 1: XL710 NIC >> ------------------------------------------------------------- >> Without fast free With fast free >> Thunderx2: +7.53% +13.54% >> ------------------------------------------------------------- >> >> (2) dpdk l3fwd test with same driver: >> port 0 && 1: XL710 NIC >> ------------------------------------------------------------- >> Without fast free With fast free >> Ampere altra: +12.61% +11.42% >> n1sdp: +8.30% +3.85% >> x86-sse: +8.43% +3.72% >> ------------------------------------------------------------- >> >> (3) Performance comparison with ZC_mempool used >> port 0 && 1: XL710 NIC >> with fast free >> ------------------------------------------------------------- >> With recycle buffer With zc_mempool >> Ampere altra: 11.42% 3.54% >> ------------------------------------------------------------- >> >> Furthermore, we add recycle_mbuf engine in testpmd. Due to XL710 NIC has >> I/O bottleneck in testpmd in ampere altra, we can not see throughput change >> compared with I/O fwd engine. However, using record cmd in testpmd: >> '$set record-burst-stats on' >> we can see the ratio of 'Rx/Tx burst size of 32' is reduced. This >> indicate mbufs recycle can save CPU cycles. >> >> V2: >> 1. Use data-plane API to enable direct-rearm (Konstantin, Honnappa) >> 2. Add 'txq_data_get' API to get txq info for Rx (Konstantin) >> 3. Use input parameter to enable direct rearm in l3fwd (Konstantin) >> 4. Add condition detection for direct rearm API (Morten, Andrew Rybchenko) >> >> V3: >> 1. Seperate Rx and Tx operation with two APIs in direct-rearm (Konstantin) >> 2. Delete L3fwd change for direct rearm (Jerin) >> 3. enable direct rearm in ixgbe driver in Arm >> >> v4: >> 1. Rename direct-rearm as buffer recycle. Based on this, function name >> and variable name are changed to let this mode more general for all >> drivers. (Konstantin, Morten) >> 2. Add ring wrapping check (Konstantin) >> >> v5: >> 1. some change for ethdev API (Morten) >> 2. add support for avx2, sse, altivec path >> >> v6: >> 1. fix ixgbe build issue in ppc >> 2. remove 'recycle_tx_mbufs_reuse' and 'recycle_rx_descriptors_refill' >> API wrapper (Tech Board meeting) >> 3. add recycle_mbufs engine in testpmd (Tech Board meeting) >> 4. add namespace in the functions related to mbufs recycle(Ferruh) >> >> v7: >> 1. move 'rxq/txq data' pointers to the beginning of eth_dev structure, >> in order to keep them in the same cache line as rx/tx_burst function >> pointers (Morten) >> 2. add the extra description for 'rte_eth_recycle_mbufs' to show it can >> support feeding 1 Rx queue from 2 Tx queues in the same thread >> (Konstantin) >> 3. For i40e/ixgbe driver, make the previous copied buffers as invalid if >> there are Tx buffers refcnt > 1 or from unexpected mempool (Konstantin) >> 4. add check for the return value of 'rte_eth_recycle_rx_queue_info_get' >> in testpmd fwd engine (Morten) >> >> v8: >> 1. add arm/x86 build option to fix ixgbe build issue in ppc >> >> v9: >> 1. delete duplicate file name for ixgbe >> >> v10: >> 1. fix compile issue on windows >> >> v11: >> 1. fix doc warning >> >> v12: >> 1. replace rx queue check code with eth_dev_validate_rx_queue >> function (Stephen) >> 2. put port and queue check before function call (Konstantin) >> >> Feifei Wang (4): >> ethdev: add API for mbufs recycle mode >> net/i40e: implement mbufs recycle mode >> net/ixgbe: implement mbufs recycle mode >> app/testpmd: add recycle mbufs engine >> > > Thanks for dedication to improve the patchset and finding better > solution, it is appreciated. > > Series applied to dpdk-next-net/main, thanks. >
Konstantin highlighted that there is an outstanding discussion: http://patchwork.dpdk.org/project/dpdk/patch/[email protected]/ Dropping the patchset from next-net, and updating its status in the patchwork as "Changes Requested".

