Re: [Libav-user] Video and audio timing / syncing

Brad O'Hearne Sun, 31 Mar 2013 19:50:20 -0700

On Mar 31, 2013, at 6:32 PM, Kalileo <[email protected]> wrote:

Kalileo -- thanks for the reply. I'm not sure if you've read this thread and 
everything I've written, but based on the questions it appears you may have 
missed a post or two, so please forgive me if there's a rehash here.


> There's a lot of half-theory in your questions, and i get confused about your 
> intentions. Do you want to solve a problem (of video/audio not being in sync) 
> or do you want to redesign the dts/pts concept of video/audio syncing?

> Didn't you say that it's _not_ in sync now? So obviously you've to correct 
> one side, not do the same modification on both sides.
> 
> I do not understand why you need to make this so complicated. It is so easy, 
> same PTS = to be played at the same time.

I'll do my best to distill this all down as simply as possible. 

THE GOAL
Capture video and audio from QTKit, pass the decompressed video and audio 
sample buffers to FFmpeg for encoding to FLV, and output the encoded frames 
(ultimately to a network stream, but in my present prototype app, a file). This 
is live capture-encode-stream use-case where the video is then being broadcast 
and played by multiple users in as near real-time as possible. Latency and 
delay needs to be minimized and eliminated to the degree it is possible.

THE PROBLEM
I have finally determined through many hours of testing that the problem here 
is NOT pts and dts values I am assigning. The values I am assigning to pts and 
dts are 100% accurate -- every video and audio sample buffer received from 
QuickTime (QTSampleBuffer) delivers its  exact presentation time, decode time, 
and duration. When I plug these values into the AVPacket pts and dts values, 
video and audio is perfectly synced provided that -- and here's the crux of the 
issues -- the time_base.den value matches EXACTLY the *actual* frame rate of 
captured frames being returned. If the actual frame rate is different from the 
frame rate indicated in time_base.den, then the video does not play properly. 
In my specific use case, I had configured a minimum frame rate of 24 fps on my 
QTDecompressedVideoCaptureOutput, and so expecting that frame rate, I 
configured my codec context time_base.den to be 24 as well. What happened, 
however, is that despite being configured to output 24 fps, it a
 ctually output fewer fps, and when that happened, even though the pts and dts 
values were the exact ones delivered on the sample buffers, the video played 
much faster than it should, while the audio was still perfect. So I manually 
went through my console log, counted how many frames per second were actually 
being received from capture (15), and hard-coded 15 as the time_base.den value. 
I reran my code with no other changes, and the video and audio is synced 
perfectly. The problem is the nature of the time_base, and however internally 
it is being used in encoding. 

Here is the present problem in a single statement: the encoding process 
requires that the time_base.den value on the codec context be set *prior to 
encoding* to a fixed fps, but if actual fps varies from the time_base.den fps, 
the video doesn't play properly (and also any relative adjustment you try to 
make to pts in time_base units will be off as well). That's it in a nutshell -- 
there's no guarantee that a capture source is going to deliver frames at the 
fixed fps in the time_base, and if it doesn't, timing is off.

THOUGHTS
I don't know how the various codecs work internally (mine is adpcm_swf), but 
just from pounding on them with tests from the outside, it appears that the 
time_base.den governs most everything. As stated, unfortunately it wants a 
fixed value for a variable unit (in a capture scenario), so even though I have 
presentation time, decode time, and duration, the disparity between the actual 
frame rate and the time_base.den throws everything off.

I am curious about the purpose and use of the AVPacket.duration value. I'm 
suspecting it isn't being used at all. I cannot verify this at this point, but 
I'm suspecting that one possibility of what is happening is that QuickTime 
could be accomplishing a 30fps frame rate by delivering 15fps with single frame 
duration * 2. I'd guess that if the codec context had a time_base oriented to 
time (such as milliseconds), a metric which does not fluctuate, and duration 
was considered, none of this would be a problem. Not knowing the internals of 
avcodec, however, I cannot say for sure. 

But QuickTime stuff is a different issue, and on the QuickTime side of things, 
and it doesn't change the problem in FFmpeg (and we are going to be doing the 
same thing on a Windows box soon, so it will be the same thing there with 
Windows hardware) -- differing cameras, computers, etc., the capture frame rate 
cannot be assumed as fixed (nor is it known up front) so having to specify an 
accurate fixed FPS for time_base.den is problematic, unless there's another way 
to rectify the problem. 

I hope that helps clear up any confusion. 

Thanks,

Brad
_______________________________________________
Libav-user mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/libav-user

Re: [Libav-user] Video and audio timing / syncing

Reply via email to