aarch64: Add neon implementation for sse16

Martin Storsjö Thu, 18 Aug 2022 02:09:52 -0700

On Tue, 16 Aug 2022, Hubert Mazur wrote:

Provide neon implementation for sse16 function.


Performance comparison tests are shown below.
- sse_0_c: 268.2
- sse_0_neon: 43.5

Benchmarks and tests run with checkasm tool on AWS Graviton 3.

Signed-off-by: Hubert Mazur <h...@semihalf.com>
---
libavcodec/aarch64/me_cmp_init_aarch64.c |  4 ++
libavcodec/aarch64/me_cmp_neon.S         | 76 ++++++++++++++++++++++++
2 files changed, 80 insertions(+)

diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c 
b/libavcodec/aarch64/me_cmp_init_aarch64.c
index 79c739914f..7780009d41 100644
--- a/libavcodec/aarch64/me_cmp_init_aarch64.c
+++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
@@ -30,6 +30,9 @@ int ff_pix_abs16_xy2_neon(MpegEncContext *s, const uint8_t 
*blk1, const uint8_t
int ff_pix_abs16_x2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t 
*pix2,
                      ptrdiff_t stride, int h);

+int sse16_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2,
+                      ptrdiff_t stride, int h);
+

The second line of the function delcaration is incorrectly indented (itshould be aligned with the opening parenthesis). I fixed this for thepreexisting cases and the new patches, that I pushed.

diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
index cda7ce0408..825ce45d13 100644
--- a/libavcodec/aarch64/me_cmp_neon.S
+++ b/libavcodec/aarch64/me_cmp_neon.S
@@ -270,3 +270,79 @@ function ff_pix_abs16_x2_neon, export=1

        ret
endfunc
+
+function sse16_neon, export=1
+        // x0 - unused
+        // x1 - pix1
+        // x2 - pix2
+        // x3 - stride
+        // w4 - h
+
+        cmp             w4, #4
+        movi            d18, #0


The d18 register was essentially unused

+3:
+        uaddlv          d16, v17.4s                     // add up accumulator 
vector
+        add             d18, d18, d16
+
+        fmov            w0, s18


Here, the d18 register could be left out entirely.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/5] lavc/aarch64: Add neon implementation for sse16

Reply via email to