Hi @ Varghese, Vipin, I am interested in this rte_memcpy patch performance on our x86 platform too. May you please share the configuration file and steps for your dma-perf test then I can run the same on our platform? Thanks.
Regards, Liangxing > -----Original Message----- > From: Varghese, Vipin <[email protected]> > Sent: 2026年1月22日 15:00 > To: Varghese, Vipin <[email protected]>; Morten Brørup > <[email protected]>; Stephen Hemminger > <[email protected]>; P, Thiyagarajan <[email protected]>; > Murali Krishna, Bala <[email protected]> > Cc: [email protected]; Bruce Richardson <[email protected]>; > Konstantin Ananyev <[email protected]> > Subject: RE: [PATCH v6] eal/x86: optimize memcpy of small sizes > > [Public] > > Hi @Morten Brørup, > > We (@P, Thiyagarajan @Murali Krishna, Bala and myself) have used dma-perf > to validate the performance from 1B to 17B payload. > Following are our observations > > With c_args `-DRTE_MEMCPY_AVX512` enabled on zen4, we observe around > 25% performance regression for payload size 1B to 15B and 17B. > While in case of 16B we see improvement in Mops by 40%. > > Without c_args `-DRTE_MEMCPY_AVX512` enabled on zen4, we observe +-4% > variation from 1B to 17B. > > `We are investigating the variation is more prominent with avx512 memcpy.` > > Note: > 1. in zen4 ld|str is broken to 32B. While in zen5 ld|str is 64B. > 2. we tested memif copy on zen5 with patch (without -DRTE_MEMCPY_AVX512) > on 64B and 65B payload. It is same as zen4 observation (shared in previous > email). > > > > > -----Original Message----- > > From: Varghese, Vipin <[email protected]> > > Sent: Wednesday, January 21, 2026 5:19 PM > > To: Morten Brørup <[email protected]>; Stephen Hemminger > > <[email protected]> > > Cc: [email protected]; Bruce Richardson <[email protected]>; > > Konstantin Ananyev <[email protected]> > > Subject: RE: [PATCH v6] eal/x86: optimize memcpy of small sizes > > > > Caution: This message originated from an External Source. Use proper > > caution when opening attachments, clicking links, or responding. > > > > > > [Public] > > > > Hi @Morten Brørup, please find our observation running testpmd with > > memif in zero-copy mode disabled (rte_memcpy enabled). > > > > 1. DPDK baseline version: 25.11 we tested with testpmd in io & flowgen > mode 2. > > using no cargs for memcpy (rtemov32) and with patch 64B & 65B we get > > `15.5Mpps` 3. using cargs ` -DRTE_MEMCPY_AVX512` for memcpy > (rtemov64) > > and with patch 64B & 65B we get `14.8Mpps` > > > > We will run with dma-perf application for payload sizes of > > 1,2,3,4,5,...etc > > > > Regards > > Vipin Varghese

