RE: [PATCH v6] eal/x86: optimize memcpy of small sizes

Liangxing Wang Wed, 21 Jan 2026 23:38:50 -0800

Hi @ Varghese, Vipin,

I am interested in this rte_memcpy patch performance on our x86 platform too. 
May you please share the configuration file and steps for your dma-perf test 
then I can run the same on our platform? Thanks.


Regards,
Liangxing
> -----Original Message-----
> From: Varghese, Vipin <[email protected]>
> Sent: 2026年1月22日 15:00
> To: Varghese, Vipin <[email protected]>; Morten Brørup
> <[email protected]>; Stephen Hemminger
> <[email protected]>; P, Thiyagarajan <[email protected]>;
> Murali Krishna, Bala <[email protected]>
> Cc: [email protected]; Bruce Richardson <[email protected]>;
> Konstantin Ananyev <[email protected]>
> Subject: RE: [PATCH v6] eal/x86: optimize memcpy of small sizes
> 
> [Public]
> 
> Hi @Morten Brørup,
> 
> We (@P, Thiyagarajan @Murali Krishna, Bala and myself) have used dma-perf
> to validate the performance from 1B to 17B payload.
> Following are our observations
> 
> With c_args `-DRTE_MEMCPY_AVX512` enabled on zen4, we observe around
> 25% performance regression for payload size 1B to 15B and 17B.
> While in case of 16B we see improvement in Mops by 40%.
> 
> Without c_args `-DRTE_MEMCPY_AVX512` enabled on zen4, we observe +-4%
> variation from 1B to 17B.
> 
> `We are investigating the variation is more prominent with avx512 memcpy.`
> 
> Note:
> 1. in zen4 ld|str is broken to 32B. While in zen5 ld|str is 64B.
> 2. we tested memif copy on zen5 with patch (without -DRTE_MEMCPY_AVX512)
> on 64B and 65B payload. It is same as zen4 observation (shared in previous
> email).
> 
> 
> 
> > -----Original Message-----
> > From: Varghese, Vipin <[email protected]>
> > Sent: Wednesday, January 21, 2026 5:19 PM
> > To: Morten Brørup <[email protected]>; Stephen Hemminger
> > <[email protected]>
> > Cc: [email protected]; Bruce Richardson <[email protected]>;
> > Konstantin Ananyev <[email protected]>
> > Subject: RE: [PATCH v6] eal/x86: optimize memcpy of small sizes
> >
> > Caution: This message originated from an External Source. Use proper
> > caution when opening attachments, clicking links, or responding.
> >
> >
> > [Public]
> >
> > Hi @Morten Brørup, please find our observation running testpmd with
> > memif in zero-copy mode disabled (rte_memcpy enabled).
> >
> > 1. DPDK baseline version: 25.11 we tested with testpmd in io & flowgen
> mode 2.
> > using no cargs for memcpy (rtemov32) and with patch 64B & 65B we get
> > `15.5Mpps` 3. using cargs ` -DRTE_MEMCPY_AVX512` for memcpy
> (rtemov64)
> > and with patch 64B & 65B we get `14.8Mpps`
> >
> > We will run with dma-perf application for payload sizes of
> > 1,2,3,4,5,...etc
> >
> > Regards
> > Vipin Varghese

RE: [PATCH v6] eal/x86: optimize memcpy of small sizes

Reply via email to