On Thu, 9 Feb 2017, Janne Grunau wrote:
On 2017-02-06 00:16:41 +0200, Martin Storsjö wrote:
Ok, so after running a slightly shorter clip (which seems to have about as
large percentage of runtime doing IDCT as the previous one) with a bit more
iterations, I've got the following results (the 'user' part from 'time
avconv -threads 1 -i foo -f null -'):
32 orig 32 alt1 32 alt2 64 orig 64 alt1 64 alt2
40.436s 40.148s 40.008s 37.428s 37.356s 37.192s
40.596s 40.140s 40.216s 37.572s 37.524s 37.384s
40.512s 40.228s 40.188s 37.740s 37.588s 37.368s
40.584s 40.136s 40.216s 37.880s 37.492s 37.348s
40.572s 40.292s 40.232s 37.756s 37.556s 37.676s
40.764s 40.312s 40.232s 37.876s 37.640s 37.468s
40.688s 40.284s 40.368s 37.972s 37.608s 37.460s
So while alt2 is faster in most runs, the margin is not quite as big as in
the previous benchmark. (The benchmarks were done on a practically unloaded
system so it shouldn't vary too much from run to run, but in practice, the
first few runs seem to be slightly faster than the later ones.)
I.e. around 400 ms gain out of 40 s for alt1, and then another -50 - +150 ms
speedup on top of that for alt2.
What do you think?
At least it looks like the difference between alt1 and alt2 are quite
similar on 32- and 64-bit. So we should use the same variant on both
archs. I favor alternate 2.
Ok then - I'll try to polish up and push alternative 2 based on the
feedback I got.
// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel