This patch is about improving the startcode search in libavcodec/startcode.c; this is used by the H.264 and the VC-1 parsers. (In this context, "startcode" always means the MPEG-1/2/4, H.264/5, VC-1 startcode 0x00 0x00 0x01, potentially with another leading zero.) There are currently three things to improve about it:
1. It doesn't really find startcodes, but searches for zeros and lets the caller weed out the real startcodes from it. This leads to lots (millions per GB of video parsed) of unnecessary function calls with the accompanying overhead. 2. It uses a suboptimal pattern for its search; improving it can improve performance. 3. If HAVE_FAST_UNALIGNED is false and there is no system-dependent startcode search function available (ARMV6 is currently the only system with its own search function), it resorts to checking each byte one-by-one and is therefore very slow. I have solved all these three issues. At first, I wanted to keep using not necessarily aligned reads if HAVE_FAST_UNALIGNED is true, but my benchmarks showed that even then aligned reads turned out to be superior, so the new implementation uses aligned reads for all platforms regardless of HAVE_FAST_ALIGNED. This allowed so simplify the code a bit. You can take a look at the older version at [1]. The alignment check is actually simple: Make sure that a pointer, when cast to uintptr_t, is divisible by 4 resp. 8. But given that the C standard leaves the relationship between pointer and uintptr_t mostly undefined (the only guarantee is that after casting a pointer to void to uintptr_t and back to void* the result compares equal to the original pointer) I'd encourage if someone tested this on systems where unaligned accesses lead to crashes or to abysmal performance. It would also be nice if someone could complement my x64 benchmarks with benchmarks for other systems. (Remember: For benchmarks on ARM V6 one should comment out lines 114-117 in libavcodec/arm/h264dsp_init_arm.c (and lines 31-34 in libavcodec/arm/vc1dsp_init_arm.c if one wants to test via the VC-1 parser) to disable the platform-specific startcode-search functions. I am actually curious how my version fares against the hand-written assembly version.) One should not use a container like Matroska to test this, because in this case every block contains a whole frame (so that the startcode search isn't used in such situations). Use e.g. transport streams. And when benchmarking, one should not benchmark calls to the startcode_find_candidate function directly (because the current code returns lots of false positives and this patchset changes this), but rather the calls to h264_find_frame_end or to h264_parse. - Andreas PS: Thanks to Mark for testing the earlier version [1] of this patchset on an ARM device where (so he thought) unaligned accesses would lead to (or rather: can be configured to) SIGBUS; although he has encountered no issues with my patch, he thinks that the CPU fixes up four byte unaligned accesses by itself whereas eight-byte unaligned accesses trap as expected. So further testing would be good. [1]: https://github.com/mkver/FFmpeg/commits/start_3 Andreas Rheinhardt (5): startcode: Use common macro startcode: Switch to aligned reads startcode: Stop overreading startcode: Don't return false positives startcode: Filter out non-startcodes earlier libavcodec/h264dsp.h | 7 +-- libavcodec/startcode.c | 128 ++++++++++++++++++++++++++++++++++------- libavcodec/vc1dsp.h | 6 +- 3 files changed, 112 insertions(+), 29 deletions(-) -- 2.21.0 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".