On Mon, Aug 5, 2013 at 5:29 AM, Martin Storsjö <[email protected]> wrote:
> On Wed, 31 Jul 2013, Ben Avison wrote:
>
>> On Wed, 31 Jul 2013 14:14:02 +0100, Hendrik Leppkes <[email protected]>
>> wrote:
>>>
>>> Did you measure the overhead from the extra call, without any special
>>> asm enhanced versions?
>>
>>
>> I was rather hoping nobody would ask that, to save me the trouble of
>> having to go back and re-profile them. The truth is that I only split
>> patches 2 and 3 (and 1) apart in preparation for publishing the patch
>> series. The benchmarks in patch 3 refer to the combined effect of patches
>> 1-3 - if you recall, that was an overall 6% speedup.
>>
>> Profiling patch 2 in isolation does actually lead to a 5% regression,
>> though this is more than compensated for by the fact that patch 3 by
>> itself
>> results in 11% speedup. Of course, patch 3 is ARM only. Other
>> architectures
>> will hopefully find that any regression due to patch 2 is compensated for
>> by patches 4-6, plus there's also the option to write versions of patch 3
>> targeted at them.
>
>
> What do others think about this, is the slowdown acceptable in itself? As
> long as you actually do decoding, this slowdown shouldn't really be
> measurable in the grand scheme of things - or is it? I guess it would have
> most impact on slow systems, and patch 3/6 provides an armv6 implementation.
>
> Is there anyone interested in trying to write an x86 asm version of the same
> function, that would offset the slowdown due to the extra function call?

Is there some reason the actual full "finding a startcode" process
can't be a function, instead of just the candidate?  x264 has a
similar function that abuses SIMD to find byte sequences that need
startcode emulation prevention.

Jason
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to