On Fri, 28 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:

Before and after:

A78
ac3_extract_exponents_n512_neon:                       503.2 ( 3.36x)
ac3_extract_exponents_n3072_neon:                     2986.2 ( 3.35x)

ac3_extract_exponents_n512_neon:                       211.2 ( 8.02x)
ac3_extract_exponents_n3072_neon:                     1251.5 ( 8.00x)

A72
ac3_extract_exponents_n512_neon:                       964.7 ( 2.39x)
ac3_extract_exponents_n3072_neon:                     5434.5 ( 2.47x)

ac3_extract_exponents_n512_neon:                       465.6 ( 4.87x)
ac3_extract_exponents_n3072_neon:                     2696.3 ( 4.97x)
---
This version handles 16 ints in one go and consolidates separate
extractions and writes into one. I assume the length of the input is a
multiple of 16 (there are no constraints defined in the template file),
but the tests are passing.

I have no clue about whehter this is ok or not (it may be good to check other assembly implementations if we do this on e.g. x86). Codewise, the patch looks good, thanks!

This description of the patch, what it does and the assumptions it makes, is probably nice to keep in the final commit as well, so it could be included above "---" too.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to