On Tue, 16 Aug 2022, Hubert Mazur wrote:
Provide neon implementation for sse16 function.
Performance comparison tests are shown below.
- sse_0_c: 268.2
- sse_0_neon: 43.5
Benchmarks and tests run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <h...@semihalf.com>
---
libavcodec/aarch64/me_cmp_init_aarch64.c | 4 ++
libavcodec/aarch64/me_cmp_neon.S | 76 ++++++++++++++++++++++++
2 files changed, 80 insertions(+)
diff --git a/libavcodec/aarch64/me_cmp_init_aarch64.c
b/libavcodec/aarch64/me_cmp_init_aarch64.c
index 79c739914f..7780009d41 100644
--- a/libavcodec/aarch64/me_cmp_init_aarch64.c
+++ b/libavcodec/aarch64/me_cmp_init_aarch64.c
@@ -30,6 +30,9 @@ int ff_pix_abs16_xy2_neon(MpegEncContext *s, const uint8_t
*blk1, const uint8_t
int ff_pix_abs16_x2_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t
*pix2,
ptrdiff_t stride, int h);
+int sse16_neon(MpegEncContext *v, const uint8_t *pix1, const uint8_t *pix2,
+ ptrdiff_t stride, int h);
+
The second line of the function delcaration is incorrectly indented (it
should be aligned with the opening parenthesis). I fixed this for the
preexisting cases and the new patches, that I pushed.
diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
index cda7ce0408..825ce45d13 100644
--- a/libavcodec/aarch64/me_cmp_neon.S
+++ b/libavcodec/aarch64/me_cmp_neon.S
@@ -270,3 +270,79 @@ function ff_pix_abs16_x2_neon, export=1
ret
endfunc
+
+function sse16_neon, export=1
+ // x0 - unused
+ // x1 - pix1
+ // x2 - pix2
+ // x3 - stride
+ // w4 - h
+
+ cmp w4, #4
+ movi d18, #0
The d18 register was essentially unused
+3:
+ uaddlv d16, v17.4s // add up accumulator
vector
+ add d18, d18, d16
+
+ fmov w0, s18
Here, the d18 register could be left out entirely.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".