On Fri, 28 Feb 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
Before and after: A78 ac3_sum_square_bufferfly_int32_neon: 484.8 ( 2.00x) ac3_sum_square_bufferfly_int32_neon: 468.2 ( 2.08x) A72 ac3_sum_square_bufferfly_int32_neon: 793.6 ( 1.26x) ac3_sum_square_bufferfly_int32_neon: 527.3 ( 1.92x) --- Instead of calculating a^2, b^2, (a+b)^2 and (a-b)^2, calculate only a^2, b^2 and 2*a*b in each iteration and derive the latter parts from these three at the end.
This patch looks good to me, thanks! // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".