On 16/04/14 02:50, Ben Avison wrote: > The previous implementation of the parser made four passes over each input > buffer (reduced to two if the container format already guaranteed the input > buffer corresponded to frames, such as with MKV). But these buffers are > often 200K in size, certainly enough to flush the data out of L1 cache, and > for many CPUs, all the way out to main memory. The passes were: > > 1) locate frame boundaries (not needed for MKV etc) > 2) copy the data into a contiguous block (not needed for MKV etc) > 3) locate the start codes within each frame > 4) unescape the data between start codes > > After this, the unescaped data was parsed to extract certain header fields, > but because the unescape operation was so large, this was usually also > effectively operating on uncached memory. Most of the unescaped data was > simply thrown away and never processed further. Only step 2 - because it > used memcpy - was using prefetch, making things even worse. > > This patch reorganises these steps so that, aside from the copying, the > operations are performed in parallel, maximising cache utilisation. No more > than the worst-case number of bytes needed for header parsing is unescaped. > Most of the data is, in practice, only read in order to search for a start > code, for which optimised implementations already existed in the H264 codec > (notably the ARM version uses prefetch, so we end up doing both remaining > passes at maximum speed). For MKV files, we know when we've found the last > start code of interest in a given frame, so we are able to avoid doing even > that one remaining pass for most of the buffer. > > In some use-cases (such as the Raspberry Pi) video decode is handled by the > GPU, but the entire elementary stream is still fed through the parser to > pick out certain elements of the header which are necessary to manage the > decode process. As you might expect, in these cases, the performance of the > parser is significant. > > To measure parser performance, I used the same VC-1 elementary stream in > either an MPEG-2 transport stream or a MKV file, and fed it through avconv > with -c:v copy -c:a copy -f null. These are the gperftools counts for > those streams, both filtered to only include vc1_parse() and its callees, > and unfiltered (to include the whole binary). Lower numbers are better: > > Before After > File Filtered Mean StdDev Mean StdDev Confidence Change > M2TS No 861.7 8.2 650.5 8.1 100.0% +32.5% > MKV No 868.9 7.4 731.7 9.0 100.0% +18.8% > M2TS Yes 250.0 11.2 27.2 3.4 100.0% +817.9% > MKV Yes 149.0 12.8 1.7 0.8 100.0% +8526.3% > > Yes, that last case shows vc1_parse() running 86 times faster! The M2TS > case does show a larger absolute improvement though, since it was worse > to begin with. > > This patch has been tested with the FATE suite (albeit on x86 for speed). > --- > libavcodec/vc1_parser.c | 260 > +++++++++++++++++++++++++++++------------------ > 1 files changed, 159 insertions(+), 101 deletions(-) >
The results are impressive, since the playground is now online I can do a run over the fuzz collection and see if it behaves properly in those cases =) lu _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
