On Mon, Aug 5, 2013 at 5:29 AM, Martin Storsjö <[email protected]> wrote: > On Wed, 31 Jul 2013, Ben Avison wrote: > >> On Wed, 31 Jul 2013 14:14:02 +0100, Hendrik Leppkes <[email protected]> >> wrote: >>> >>> Did you measure the overhead from the extra call, without any special >>> asm enhanced versions? >> >> >> I was rather hoping nobody would ask that, to save me the trouble of >> having to go back and re-profile them. The truth is that I only split >> patches 2 and 3 (and 1) apart in preparation for publishing the patch >> series. The benchmarks in patch 3 refer to the combined effect of patches >> 1-3 - if you recall, that was an overall 6% speedup. >> >> Profiling patch 2 in isolation does actually lead to a 5% regression, >> though this is more than compensated for by the fact that patch 3 by >> itself >> results in 11% speedup. Of course, patch 3 is ARM only. Other >> architectures >> will hopefully find that any regression due to patch 2 is compensated for >> by patches 4-6, plus there's also the option to write versions of patch 3 >> targeted at them. > > > What do others think about this, is the slowdown acceptable in itself? As > long as you actually do decoding, this slowdown shouldn't really be > measurable in the grand scheme of things - or is it? I guess it would have > most impact on slow systems, and patch 3/6 provides an armv6 implementation. > > Is there anyone interested in trying to write an x86 asm version of the same > function, that would offset the slowdown due to the extra function call?
Is there some reason the actual full "finding a startcode" process can't be a function, instead of just the candidate? x264 has a similar function that abuses SIMD to find byte sequences that need startcode emulation prevention. Jason _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
