quarter idct16 and idct32 (alternative 2)

Martin Storsjö Sun, 05 Feb 2017 03:38:42 -0800

On Sun, 5 Feb 2017, Janne Grunau wrote:

On 2017-02-05 00:34:16 +0200, Martin Storsjö wrote:

On Sat, 4 Feb 2017, Janne Grunau wrote:


>I'm not really sure which variant I prefer. Is the speed difference
>mesuable for idct heavy real world samples? If you have preference for one
>or the other variant I trust your judgement.

It's measurable, but it's not much. For one sample, I originally got a full
decode time like this (fastest time out of 2 runs) with the current master:
user    2m53.980s
Alternative 1:
user    2m53.448s
Alternative 2:
user    2m52.952s


What's is the approximate share of the idct on the whole decoding time?

It doesn't seem to be very conclusive, but only around like 3% or so,based on a run with perf, and adding up all the highest scoring idctfunctions.

So alternative 2 is better, but produces a couple KB bigger binaries, and
more duplicated code. (OTOH also allowing more exact special casing of minor
details.)

I originally clearly preferred alt 2, but with your suggestions for alt 1,
the diff for that one ends up very small and neat.
I think the numbers look pretty compelling for alternative 2. 1s vs.0.5s overall decoding speedup. The difference is larger than I expectedand imo justifies the code duplication and increased binary size. Whilethe patch for alternative 1 looks small and nice that's not really anargument. the patch for alternative 2 would also look nicer if you didthe macro move in a separate patch.


Sure, I can try to split out that part to make it more readable as well.

// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/5] arm: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 (alternative 2)

Reply via email to