On Sun, 5 Feb 2017, Janne Grunau wrote:

On 2017-02-05 00:34:16 +0200, Martin Storsjö wrote:
On Sat, 4 Feb 2017, Janne Grunau wrote:

>I'm not really sure which variant I prefer. Is the speed difference
>mesuable for idct heavy real world samples? If you have preference for one
>or the other variant I trust your judgement.

It's measurable, but it's not much. For one sample, I originally got a full
decode time like this (fastest time out of 2 runs) with the current master:
user    2m53.980s
Alternative 1:
user    2m53.448s
Alternative 2:
user    2m52.952s

What's is the approximate share of the idct on the whole decoding time?

It doesn't seem to be very conclusive, but only around like 3% or so, based on a run with perf, and adding up all the highest scoring idct functions.

So alternative 2 is better, but produces a couple KB bigger binaries, and
more duplicated code. (OTOH also allowing more exact special casing of minor
details.)

I originally clearly preferred alt 2, but with your suggestions for alt 1,
the diff for that one ends up very small and neat.

I think the numbers look pretty compelling for alternative 2. 1s vs. 0.5s overall decoding speedup. The difference is larger than I expected and imo justifies the code duplication and increased binary size. While the patch for alternative 1 looks small and nice that's not really an argument. the patch for alternative 2 would also look nicer if you did the macro move in a separate patch.

Sure, I can try to split out that part to make it more readable as well.

// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to