PR #23473 opened by mkver
URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23473
Patch URL: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/23473.patch

This avoids reg-reg moves and saves 90112B of .text here.
It also makes the code less reliant on a clean upper ymm state.

Not all functions use VEX encoding yet; besides inline assembly
functions which are not influenced by x86inc.asm there are also
functions using a mixture of xmm and mmx registers (e.g.
h264_intrapred.asm) using INIT_MMX where the automatic VEX translation
is not active. This means that some parts of the code still rely
on a clean upper ymm state.

Hint: One could do even more, e.g. remove all the REP_RETs or remove <AVX 
functions which have are overridden by a <=AVX version. One could also modify 
the EXTERNAL_SSE2 etc. macros to actually check for AVX instead of just 
presuming it from the fact that the compiler is allowed to use AVX freely. And 
one could add an explicit option for this instead of deriving it from the 
__AVX__ macro (which should cover GCC, Clang and MSVC).


>From ccaa1195c3e52b010946058b4bfa1d1ca85b9c2d Mon Sep 17 00:00:00 2001
From: Andreas Rheinhardt <[email protected]>
Date: Sun, 14 Jun 2026 00:44:36 +0200
Subject: [PATCH] avutil/x86/x86util: Force VEX encoding when using -mavx

This avoids reg-reg moves and saves 90112B of .text here.
It also makes the code less reliant on a clean upper ymm state.

Not all functions use VEX encoding yet; besides inline assembly
functions which are not influenced by x86inc.asm there are also
functions using a mixture of xmm and mmx registers (e.g.
h264_intrapred.asm) using INIT_MMX where the automatic VEX translation
is not active. This means that some parts of the code still rely
on a clean upper ymm state.

Signed-off-by: Andreas Rheinhardt <[email protected]>
---
 configure                 | 3 +++
 libavutil/x86/x86util.asm | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/configure b/configure
index e67aa362ad..ef07d4895b 100755
--- a/configure
+++ b/configure
@@ -2445,6 +2445,7 @@ ARCH_FEATURES="
     simd_align_16
     simd_align_32
     simd_align_64
+    x86_sse2avx
 "
 
 BUILTIN_LIST="
@@ -6892,6 +6893,8 @@ EOF
 
     check_cc intrinsics_sse2 emmintrin.h "__m128i test = _mm_setzero_si128()"
 
+    test_cpp_condition stddef.h "defined(__AVX__) && __AVX__" && enable 
x86_sse2avx
+
 elif enabled loongarch; then
     enabled lsx && check_inline_asm lsx '"vadd.b $vr0, $vr1, $vr2"' '-mlsx' && 
append LSXFLAGS '-mlsx'
     enabled lasx && check_inline_asm lasx '"xvadd.b $xr0, $xr1, $xr2"' 
'-mlasx' && append LASXFLAGS '-mlasx'
diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
index da41e2e5ef..6632155c99 100644
--- a/libavutil/x86/x86util.asm
+++ b/libavutil/x86/x86util.asm
@@ -27,6 +27,10 @@
 %define public_prefix  avpriv
 %define cpuflags_mmxext cpuflags_mmx2
 
+%if HAVE_X86_SSE2AVX
+%define FORCE_VEX_ENCODING 1
+%endif
+
 %include "libavutil/x86/x86inc.asm"
 
 ; expands to [base],...,[base+7*stride]
-- 
2.52.0

_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to