On 16/04/14 02:50, Ben Avison wrote:
> The previous implementation of the parser made four passes over each input
> buffer (reduced to two if the container format already guaranteed the input
> buffer corresponded to frames, such as with MKV). But these buffers are
> often 200K in size, certainly enough to flush the data out of L1 cache, and
> for many CPUs, all the way out to main memory. The passes were:
> 
> 1) locate frame boundaries (not needed for MKV etc)
> 2) copy the data into a contiguous block (not needed for MKV etc)
> 3) locate the start codes within each frame
> 4) unescape the data between start codes
> 
> After this, the unescaped data was parsed to extract certain header fields,
> but because the unescape operation was so large, this was usually also
> effectively operating on uncached memory. Most of the unescaped data was
> simply thrown away and never processed further. Only step 2 - because it
> used memcpy - was using prefetch, making things even worse.
> 
> This patch reorganises these steps so that, aside from the copying, the
> operations are performed in parallel, maximising cache utilisation. No more
> than the worst-case number of bytes needed for header parsing is unescaped.
> Most of the data is, in practice, only read in order to search for a start
> code, for which optimised implementations already existed in the H264 codec
> (notably the ARM version uses prefetch, so we end up doing both remaining
> passes at maximum speed). For MKV files, we know when we've found the last
> start code of interest in a given frame, so we are able to avoid doing even
> that one remaining pass for most of the buffer.
> 
> In some use-cases (such as the Raspberry Pi) video decode is handled by the
> GPU, but the entire elementary stream is still fed through the parser to
> pick out certain elements of the header which are necessary to manage the
> decode process. As you might expect, in these cases, the performance of the
> parser is significant.
> 
> To measure parser performance, I used the same VC-1 elementary stream in
> either an MPEG-2 transport stream or a MKV file, and fed it through avconv
> with -c:v copy -c:a copy -f null. These are the gperftools counts for
> those streams, both filtered to only include vc1_parse() and its callees,
> and unfiltered (to include the whole binary). Lower numbers are better:
> 
>                 Before          After
> File  Filtered  Mean   StdDev   Mean   StdDev  Confidence  Change
> M2TS  No        861.7  8.2      650.5  8.1     100.0%      +32.5%
> MKV   No        868.9  7.4      731.7  9.0     100.0%      +18.8%
> M2TS  Yes       250.0  11.2     27.2   3.4     100.0%      +817.9%
> MKV   Yes       149.0  12.8     1.7    0.8     100.0%      +8526.3%
> 
> Yes, that last case shows vc1_parse() running 86 times faster! The M2TS
> case does show a larger absolute improvement though, since it was worse
> to begin with.
> 
> This patch has been tested with the FATE suite (albeit on x86 for speed).
> ---
>  libavcodec/vc1_parser.c |  260 
> +++++++++++++++++++++++++++++------------------
>  1 files changed, 159 insertions(+), 101 deletions(-)
> 

The results are impressive, since the playground is now online I can do
a run over the fuzz collection and see if it behaves properly in those
cases =)

lu

_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to