flacenc: add AVX2 version of the 32-bit LPC encoder

Henrik Gramner Mon, 27 Nov 2017 09:13:32 -0800

>> Using 128-bit broadcasts is preferable over duplicating the constants
>> to 256-bit unless there's a good reason for doing so since it wastes
>> less cache and is faster on AMD CPU:s.
>
> What would that reason be? Afaik broadcasts are expensive, since they
> both load from memory then splat data across lanes. Using them inside
> loops doesn't sound like a good idea. But i guess you have more
> experience testing with more varied chips than i do.


128-bit broadcasts from memory are done in the load unit for free on
all AVX2-capable CPU:s.

> Also, by AMD cpus you mean Ryzen? Because on Bulldozer-based CPUs we
> purposely disabled functions using ymm regs.

Yes. 128-bit broadcasts have twice the throughput compared to 256-bit
loads on Ryzen since it only has 128-bit load units.
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

Reply via email to