[FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

Sebastian Pop Mon, 25 Nov 2019 14:09:20 -0800

Hi,

This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4. I have seen speedups up to 15%
on Graviton A1 instances based on A-72 cpus.


$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.037339 avg:0.037327 max:0.037550 min:0.036992

Tested with `make check` on aarch64-linux.

0001-aarch64-use-FMA-and-increase-vector-factor-to-4.patch
Description: Binary data

_______________________________________________
ffmpeg-devel mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

[FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

Reply via email to