On Fri, 28 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
Before and after: A78 ac3_extract_exponents_n512_neon: 503.2 ( 3.36x) ac3_extract_exponents_n3072_neon: 2986.2 ( 3.35x) ac3_extract_exponents_n512_neon: 211.2 ( 8.02x) ac3_extract_exponents_n3072_neon: 1251.5 ( 8.00x) A72 ac3_extract_exponents_n512_neon: 964.7 ( 2.39x) ac3_extract_exponents_n3072_neon: 5434.5 ( 2.47x) ac3_extract_exponents_n512_neon: 465.6 ( 4.87x) ac3_extract_exponents_n3072_neon: 2696.3 ( 4.97x) --- This version handles 16 ints in one go and consolidates separate extractions and writes into one. I assume the length of the input is a multiple of 16 (there are no constraints defined in the template file), but the tests are passing.
I have no clue about whehter this is ok or not (it may be good to check other assembly implementations if we do this on e.g. x86). Codewise, the patch looks good, thanks!
This description of the patch, what it does and the assumptions it makes, is probably nice to keep in the final commit as well, so it could be included above "---" too.
// Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".