On 6/14/2021 8:53 AM, Ronald S. Bultje wrote:
Hi Alan,

On Mon, Jun 14, 2021 at 7:20 AM Alan Kelly <
alankelly-at-google....@ffmpeg.org> wrote:

Broadwell and later have fast gather instructions.
---
  This is so that the avx2 version of ff_hscale8to15X which uses gather
  instructions is only selected on machines where it will actually be
  faster.


We've in the past typically done this with a bit in the cpuflags return
value. Can this be added there instead of being its own function?

Also, what is the cycle count of ssse3/avx2 implementation for this
specific function on Haswell? It would be good to note that in the
respective patch so that we understand why the check was added.

Between 9 and 12 on Haswell, 5 to 7 on Broadwell, and about 2 to 5 on Skylake and newer, acording to Agner's pdf if i'm reading it right. It's also slow on AMD before Zen 3.

And yes, this should if anything be a new cpu flag and not a new function.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to