Re: [Libav-user] Video and audio timing / syncing

Brad O'Hearne Wed, 03 Apr 2013 21:03:49 -0700

On Mar 31, 2013, at 11:36 PM, Brad O'Hearne <[email protected]> wrote:
> Presuming there's no unknowns about changing the time_base.den on the fly 
> throughout encoding, problem solved.


> Throughout the weeks of Googling and reading endless source code, forum / 
> mailing list posts, blogs, etc. on this, I had picked up the impression that 
> time_base.den was to be set once prior to encoding and not mucked with 
> thereafter. However, I just used the duration to calculate the frame rate and 
> now I'm setting the time_base.den prior to pts and dts for every frame. Works 
> great. 

For the sake of those who might follow with a similar use-case, and as a basis 
for making a suggestion, I need to add a footnote to my previous email on 
resolution. As it turned out, the testing I did at that time which produced the 
"Works great" conclusion above didn't bounce the frame rate significantly to 
expose the fact that there was an actual problem doing this. By pot luck, I 
came across a manipulation of the video camera which significantly bounced the 
frame rate (it cut it in half), and when time_base.den was changed on the fly 
to match the new frame rate, the subsequent resulting pts in the new time_base 
units resulted in an inaccurate timing, and the "non-monotonically 
increasing..." error for pts and dts, and also the out-of-sync audio and video 
problem again.

As it turns out, the original impression I had picked up that time_base should 
not be changed on the fly is correct. The time_base should not be changed to 
match a variable frame rate, the time_base should be set up front and remain 
constant for the entire encoding process. Given the current definition of 
time_base, the proper way to handle a variable frame-rate is to do the 
following: 

1. Set time_base.den such that you can assume that the frame rate will never 
increase. I set my time_base.den value to 30, as I didn't foresee ever 
receiving a frame-rate higher than 30fps. 

2. Use pts and dts values which increment by 1 for every frame. 

3. Use the presentation time, duration and calculated frame rate of the 
received sample buffer to determine the frame rate, and whether the current 
frame should be encoded/written 0 times, 1 time, or multiple times based on how 
many frames the encoder is expecting at that specific pts. In other words, if 
the frame rate is bouncing around, a particular frame may need to be written 
only once (normal), multiple times (the frame rate has dropped), or not at all 
(frame rate has increased). 

That last step which ironed out all timing issues made clear to me some of the 
things I had seen in various examples on the Internet (though not the FFmpeg 
official examples) which spoke of "delayed frames". I'm not completely sure it 
was the exact same problem being addressed, but it made sense after having to 
do this the general idea in play -- bottom line, the implementer has to 
fabricate fixed-fps out of variable fps. 

As a point of suggestion, I would suggest that the FFmpeg maintainers either 
consider adding this fps smoothing for variable fps inside of avcodec, or 
alternatively reconsider the anchoring of time_base from the current 
potentially variable metric of frame rate to a fixed metric against which pts 
and dts can reliably and easily be converted. Frame rate is only truly fixed 
with either auto-generated frames (such as the FFmpeg examples) or when 
encoding a pre-existing file. But for live-capture, frame rate is variable -- 
hardware / software / latency etc., not to make mention of the fact that the 
capture mechanism in play (QTKit or otherwise) doesn't necessarily guarantee 
*any* particular frame rate.

I am not sure the design reason for making time-base be effectively frame-rate 
units, but as stated, frame-rate is a potentially varying metric. I would think 
that a time_base anchored to a fixed metric (such as time itself, e.g. 
milliseconds in a second - 1000) would be a much more reliable and versatile 
design, as it would serve fixed and variable frame rate scenarios equally well. 
I found it a little strange that I was receiving sample buffers from the 
capture mechanism with *exact* decode time, presentation time, and duration 
time, and yet while logically this is completely sufficient info to set frame 
timings, there were gymnastics and compensation required so as to accommodate a 
fixed frame rate, which as stated, in a live-capture scenario is basically 
fictional. 

If there's no alteration to the time_base design, then I would again encourage 
adding fps smoothing to avcodec. If event that is not possible or desirable, at 
least add the algorithm for doing so to the FFmpeg code examples. 

While I have some code cleanup yet to do, I have updated the video streaming 
part of my sample app to include this handling, if anyone now or down the road 
can benefit: 

https://github.com/BigHillSoftware/QTFFmpeg

Cheers, 

Brad
_______________________________________________
Libav-user mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/libav-user

Re: [Libav-user] Video and audio timing / syncing

Reply via email to