PR #21427 opened by Zhao Zhili (quink)
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21427
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21427.patch

Manually unrolling loops increases code size, which can sometimes
improve performance, but more often than not, it degrades performance.
Keep the C version simple, and add assembly optimizations when needed.

                 x86-clang    x86-gcc-arch-native  x86-msvc     m1-clang      
rpi5-clang       pi5-gcc-14
-------------------------------------------------------------------------------------------------------------
bswap_buf_c      57.3 ( 1.00x)  19.4 ( 1.00x)   55.4 ( 1.00x)   0.5 ( 1.00x)  
143.5 ( 1.00x)   59.8 ( 1.00x)
bswap_buf_this*  49.0 ( 1.17x)  12.5 ( 1.56x)   17.7 ( 3.13x)   0.3 ( 2.04x)   
57.9 ( 2.48x)   73.5 ( 0.81x)
bswap_buf_sse2   28.4 ( 2.02x)  24.3 ( 0.80x)   25.5 ( 2.18x)   -              
-               -
bswap_buf_ssse3  24.6 ( 2.32x)  16.0 ( 1.22x)   19.0 ( 2.92x)   -              
-               -
bswap_buf_avx2   21.2 ( 2.70x)  11.1 ( 1.74x)   11.2 ( 4.95x)   -              
-               -

bswap_buf_c: C implementation before this patch
bswap_buf_this: C implementation after this patch

Signed-off-by: Zhao Zhili <[email protected]>


>From e41f8e1ac5984bc9ba7abc21fcf49fc55b666e93 Mon Sep 17 00:00:00 2001
From: Zhao Zhili <[email protected]>
Date: Sat, 10 Jan 2026 22:07:16 +0800
Subject: [PATCH] avcodec/bswapdsp: improve performance by remove manually
 unroll

Manually unrolling loops increases code size, which can sometimes
improve performance, but more often than not, it degrades performance.
Keep the C version simple, and add assembly optimizations when needed.

                 x86-clang    x86-gcc-arch-native  x86-msvc     m1-clang      
rpi5-clang       pi5-gcc-14
-------------------------------------------------------------------------------------------------------------
bswap_buf_c      57.3 ( 1.00x)  19.4 ( 1.00x)   55.4 ( 1.00x)   0.5 ( 1.00x)  
143.5 ( 1.00x)   59.8 ( 1.00x)
bswap_buf_this*  49.0 ( 1.17x)  12.5 ( 1.56x)   17.7 ( 3.13x)   0.3 ( 2.04x)   
57.9 ( 2.48x)   73.5 ( 0.81x)
bswap_buf_sse2   28.4 ( 2.02x)  24.3 ( 0.80x)   25.5 ( 2.18x)   -              
-               -
bswap_buf_ssse3  24.6 ( 2.32x)  16.0 ( 1.22x)   19.0 ( 2.92x)   -              
-               -
bswap_buf_avx2   21.2 ( 2.70x)  11.1 ( 1.74x)   11.2 ( 4.95x)   -              
-               -

bswap_buf_c: C implementation before this patch
bswap_buf_this: C implementation after this patch

Signed-off-by: Zhao Zhili <[email protected]>
---
 libavcodec/bswapdsp.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/libavcodec/bswapdsp.c b/libavcodec/bswapdsp.c
index f375ab79ac..266aeca44a 100644
--- a/libavcodec/bswapdsp.c
+++ b/libavcodec/bswapdsp.c
@@ -24,19 +24,7 @@
 
 static void bswap_buf(uint32_t *dst, const uint32_t *src, int w)
 {
-    int i;
-
-    for (i = 0; i + 8 <= w; i += 8) {
-        dst[i + 0] = av_bswap32(src[i + 0]);
-        dst[i + 1] = av_bswap32(src[i + 1]);
-        dst[i + 2] = av_bswap32(src[i + 2]);
-        dst[i + 3] = av_bswap32(src[i + 3]);
-        dst[i + 4] = av_bswap32(src[i + 4]);
-        dst[i + 5] = av_bswap32(src[i + 5]);
-        dst[i + 6] = av_bswap32(src[i + 6]);
-        dst[i + 7] = av_bswap32(src[i + 7]);
-    }
-    for (; i < w; i++)
+    for (int i = 0; i < w; i++)
         dst[i + 0] = av_bswap32(src[i + 0]);
 }
 
-- 
2.49.1

_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to