RE: [PATCH v4] eal/x86: optimize memcpy of small sizes

Morten Brørup Tue, 25 Nov 2025 00:19:23 -0800

> Also, all uses of SSE2 _mm_loadu_si128() intrinsics were upgraded to
> SSE3 _mm_lddqu_si128().
> The Intel Intrinsics Guide notes that it may perform better when the
> data crosses a cache line boundary.


It turns out _mm_lddqu_si128() is much slower than _mm_loadu_si128().
Would have been nice if the Intel Intrinsics Guide mentioned that.

Marked v4 patch as Not Applicable, and changed v3 patch back to New.

RE: [PATCH v4] eal/x86: optimize memcpy of small sizes

Reply via email to