On Wed, 31 Jul 2013 14:14:02 +0100, Hendrik Leppkes <[email protected]> wrote:
Did you measure the overhead from the extra call, without any special asm enhanced versions?
I was rather hoping nobody would ask that, to save me the trouble of having to go back and re-profile them. The truth is that I only split patches 2 and 3 (and 1) apart in preparation for publishing the patch series. The benchmarks in patch 3 refer to the combined effect of patches 1-3 - if you recall, that was an overall 6% speedup. Profiling patch 2 in isolation does actually lead to a 5% regression, though this is more than compensated for by the fact that patch 3 by itself results in 11% speedup. Of course, patch 3 is ARM only. Other architectures will hopefully find that any regression due to patch 2 is compensated for by patches 4-6, plus there's also the option to write versions of patch 3 targeted at them. It should be noted that the ARM patch uses a more advanced algorithm for detecting start codes, looking for 16-bit patterns rather than 8-bit patterns as the C code does, so it returns far fewer false positives, each of which would have resulted in a penalty corresponding to the function call dispatch time. I would have adapted the C version to use the same algorithm, but it relies upon being able to read multibyte data in network byte order, and this doesn't really lend itself to being expressed in C. Ben _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
