On 9/30/2025 11:07 AM, Shaiq Wani wrote:
In case some CPUs don't support AVX512. Enable AVX2 for them to
get better per-core performance.

In the single queue model, the same descriptor queue is used by SW
to post descriptors to the device and used by device to report completed
descriptors to SW. While as the split queue model separates them into
different queues for parallel processing and improved performance.

Signed-off-by: Shaiq Wani <shaiq.w...@intel.com>
---

Hi Shaiq,

+static inline void
+idpf_splitq_vtx_avx2(struct idpf_flex_tx_sched_desc *txdp,
+                               struct rte_mbuf **pkt, uint16_t nb_pkts, 
uint64_t flags)
+{
+       const uint64_t hi_qw_tmpl = IDPF_TX_DESC_DTYPE_FLEX_FLOW_SCHE |
+               ((uint64_t)flags);
+
+       /* align if needed */
+       if (((uintptr_t)txdp & 0x1F) != 0 && nb_pkts != 0) {
+               idpf_splitq_vtx1_avx2(txdp, *pkt, flags);
+               txdp++, pkt++, nb_pkts--;
+       }
+
+       for (; nb_pkts > 3; txdp += 4, pkt += 4, nb_pkts -= 4) {

Nitpicking, but in some other places these '4' constants are used as IDPF_VPMD_DESCS_PER_LOOP (or IDPF_DESCS_PER_LOOP_AVX which is 8), so it would be nice to reflect that in the loop header, e.g.

for (; nb_pkts >= IDPF_VPMD_DESCS_PER_LOOP;
                txdp += IDPF_VPMD_DESCS_PER_LOOP,
                pkt += IDPF_VPMD_DESCS_PER_LOOP,
                nb_pkts -= IDPF_VPMD_DESCS_PER_LOOP)

Then again, looking at other places in the same file, we do not do this consistently so either way would be fine.

--
Thanks,
Anatoly

Reply via email to