On Mon, 05 Aug 2013 16:17:49 +0100, Jason Garrett-Glaser <[email protected]> wrote:
Is there some reason the actual full "finding a startcode" process can't be a function, instead of just the candidate?
It was all about trying to find a simple operation that could conveniently be written in assembly without introducing too many interworking issues with the C. Coding the whole of h264_find_frame_end in assembly didn't make much sense, partly because there are implementation differences in libav and ffmpeg, partly because strictly speaking neither of them actually follows the H.264 standard's definition of an AU boundary, so it might be changed in future, and I didn't want that to mean that multiple assembly implementations needed to be updated in line. For what it's worth, the function currently scans the input buffer for start codes, stopping when it finds: * an AU delimiter, SPS, PPS or SEI NAL unit following a VCL NAL unit; or * a (VCL) NAL unit of types 1, 2 or 5 (that's most of the VCL NAL units containing a slice_header(), but excluding type 19), in which case it reads the first_mb_in_slice field libav takes a first_mb_in_slice of value 0 to mean a new "frame" (I think it should really say picture but I wasn't about to rename the functions). ffmpeg buffers up 4 bytes then does an exp-Golomb decode of the field, and checks to see if it is lower than or the same as in the previous such NAL unit. See sections 7.4.1.2.3 and 7.4.1.2.4 of the standard for how it's really supposed to be done. The other problem is that h264_find_frame_end receives arbitrary blocks of data, and it may be the case that a start code actually straddles adjacent blocks. Thus, a full startcode search would need to be stateful; the way I defined the function was such that the implementation doesn't need to store any state. Logically, if you did, then you'd want to share the state variable currently used by the C code, however it's cunningly (but obscurely) designed such that not only does it hold the parse state, but also a record of whether the start code featured 2 or 3 leading zeros. I didn't really fancy defining such a complex state variable into the API, especially when the number of false positives caused by my stateless implementation was so low. The real problem with the regression in the C case is that it's not actually a very good search: it merely checks for the presence of a zero byte within a 32-bit or 64-bit word, so there are lots of false positives, causing the function to be called many times per startcode. What would have been useful was basically a fast inline htonl() operator in C, but there isn't one: that's why I suspect we're probably ultimately going to want assembly versions for each CPU. Ben _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
