On Mon, 05 Aug 2013 16:17:49 +0100, Jason Garrett-Glaser <[email protected]> wrote:
Is there some reason the actual full "finding a startcode" process
can't be a function, instead of just the candidate?

It was all about trying to find a simple operation that could
conveniently be written in assembly without introducing too many
interworking issues with the C.

Coding the whole of h264_find_frame_end in assembly didn't make much
sense, partly because there are implementation differences in libav and
ffmpeg, partly because strictly speaking neither of them actually follows
the H.264 standard's definition of an AU boundary, so it might be changed
in future, and I didn't want that to mean that multiple assembly
implementations needed to be updated in line.

For what it's worth, the function currently scans the input buffer for
start codes, stopping when it finds:
* an AU delimiter, SPS, PPS or SEI NAL unit following a VCL NAL unit; or
* a (VCL) NAL unit of types 1, 2 or 5 (that's most of the VCL NAL units
   containing a slice_header(), but excluding type 19), in which case it
   reads the first_mb_in_slice field

libav takes a first_mb_in_slice of value 0 to mean a new "frame" (I think
it should really say picture but I wasn't about to rename the functions).
ffmpeg buffers up 4 bytes then does an exp-Golomb decode of the field,
and checks to see if it is lower than or the same as in the previous such
NAL unit. See sections 7.4.1.2.3 and 7.4.1.2.4 of the standard for how
it's really supposed to be done.

The other problem is that h264_find_frame_end receives arbitrary blocks
of data, and it may be the case that a start code actually straddles
adjacent blocks. Thus, a full startcode search would need to be stateful;
the way I defined the function was such that the implementation doesn't
need to store any state. Logically, if you did, then you'd want to share
the state variable currently used by the C code, however it's cunningly
(but obscurely) designed such that not only does it hold the parse state,
but also a record of whether the start code featured 2 or 3 leading zeros.

I didn't really fancy defining such a complex state variable into the
API, especially when the number of false positives caused by my stateless
implementation was so low. The real problem with the regression in the C
case is that it's not actually a very good search: it merely checks for
the presence of a zero byte within a 32-bit or 64-bit word, so there are
lots of false positives, causing the function to be called many times per
startcode.

What would have been useful was basically a fast inline htonl() operator
in C, but there isn't one: that's why I suspect we're probably ultimately
going to want assembly versions for each CPU.

Ben
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to