On Thu, 9 Feb 2017, Janne Grunau wrote:

On 2017-02-06 00:16:41 +0200, Martin Storsjö wrote:

Ok, so after running a slightly shorter clip (which seems to have about as
large percentage of runtime doing IDCT as the previous one) with a bit more
iterations, I've got the following results (the 'user' part from 'time
avconv -threads 1 -i foo -f null -'):

32 orig   32 alt1   32 alt2   64 orig   64 alt1   64 alt2
40.436s   40.148s   40.008s   37.428s   37.356s   37.192s
40.596s   40.140s   40.216s   37.572s   37.524s   37.384s
40.512s   40.228s   40.188s   37.740s   37.588s   37.368s
40.584s   40.136s   40.216s   37.880s   37.492s   37.348s
40.572s   40.292s   40.232s   37.756s   37.556s   37.676s
40.764s   40.312s   40.232s   37.876s   37.640s   37.468s
40.688s   40.284s   40.368s   37.972s   37.608s   37.460s

So while alt2 is faster in most runs, the margin is not quite as big as in
the previous benchmark. (The benchmarks were done on a practically unloaded
system so it shouldn't vary too much from run to run, but in practice, the
first few runs seem to be slightly faster than the later ones.)

I.e. around 400 ms gain out of 40 s for alt1, and then another -50 - +150 ms
speedup on top of that for alt2.

What do you think?

At least it looks like the difference between alt1 and alt2 are quite similar on 32- and 64-bit. So we should use the same variant on both archs. I favor alternate 2.

Ok then - I'll try to polish up and push alternative 2 based on the feedback I got.

// Martin
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to