On Sun, 5 Feb 2017, Janne Grunau wrote:
On 2017-02-05 00:34:16 +0200, Martin Storsjö wrote:
On Sat, 4 Feb 2017, Janne Grunau wrote:
>I'm not really sure which variant I prefer. Is the speed difference
>mesuable for idct heavy real world samples? If you have preference for one
>or the other variant I trust your judgement.
It's measurable, but it's not much. For one sample, I originally got a full
decode time like this (fastest time out of 2 runs) with the current master:
user 2m53.980s
Alternative 1:
user 2m53.448s
Alternative 2:
user 2m52.952s
What's is the approximate share of the idct on the whole decoding time?
It doesn't seem to be very conclusive, but only around like 3% or so,
based on a run with perf, and adding up all the highest scoring idct
functions.
So alternative 2 is better, but produces a couple KB bigger binaries, and
more duplicated code. (OTOH also allowing more exact special casing of minor
details.)
I originally clearly preferred alt 2, but with your suggestions for alt 1,
the diff for that one ends up very small and neat.
I think the numbers look pretty compelling for alternative 2. 1s vs.
0.5s overall decoding speedup. The difference is larger than I expected
and imo justifies the code duplication and increased binary size. While
the patch for alternative 1 looks small and nice that's not really an
argument. the patch for alternative 2 would also look nicer if you did
the macro move in a separate patch.
Sure, I can try to split out that part to make it more readable as well.
// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel