On Wed, 31 Jul 2013 14:14:02 +0100, Hendrik Leppkes <[email protected]> wrote:
Did you measure the overhead from the extra call, without any special
asm enhanced versions?

I was rather hoping nobody would ask that, to save me the trouble of
having to go back and re-profile them. The truth is that I only split
patches 2 and 3 (and 1) apart in preparation for publishing the patch
series. The benchmarks in patch 3 refer to the combined effect of patches
1-3 - if you recall, that was an overall 6% speedup.

Profiling patch 2 in isolation does actually lead to a 5% regression,
though this is more than compensated for by the fact that patch 3 by itself
results in 11% speedup. Of course, patch 3 is ARM only. Other architectures
will hopefully find that any regression due to patch 2 is compensated for
by patches 4-6, plus there's also the option to write versions of patch 3
targeted at them.

It should be noted that the ARM patch uses a more advanced algorithm for
detecting start codes, looking for 16-bit patterns rather than 8-bit
patterns as the C code does, so it returns far fewer false positives,
each of which would have resulted in a penalty corresponding to the
function call dispatch time. I would have adapted the C version to use the
same algorithm, but it relies upon being able to read multibyte data in
network byte order, and this doesn't really lend itself to being expressed
in C.

Ben
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to