Re: Delay and buffering problems at MediaStreamGraph during camera preview

Randell Jesup Sat, 23 Mar 2013 22:23:53 -0700

On 3/18/2013 8:29 PM, Eric Rescorla wrote:

On Mon, Mar 18, 2013 at 4:54 PM, Robert O'Callahan <[email protected]>wrote:

As far as I know there are two major problems with the way MSG video works
right now:

1) In WebRTC we don't want to hold up playing audio for a time interval
[T1, T2] until all video frames up to and including T2 have been decoded
(MSG currently requires this). We'd rather just go ahead and play the audio
and if video decoding has fallen behind audio, render the latest video
frame as soon as it's available (preferably without waiting for an MSG
iteration). Of course if video decoding is far enough ahead we should sync
video frames to the audio instead (and then MSG needs to be involved since
it has the audio track(s).


It's probably worth mentioning at this point that the current WebRTC video
implementation (as does the gUM one) just returns the latest video frame
upon request. So if (say) two video frames come in during the time period
between NotifyPull()s, we just deliver the most recent one. Obviously,
we could buffer them and deliver as two segments, but if we went to
a model where we pushed video onto the MSG (which is what GIPS
expects), then we wouldn't bother.

In theory, GetUserMedia *should* be pushing in video frames as theyoccur and letting the sink decide what to do with them. In my mind,playout sinks of a MediaStream should pull on Audio and Video (typicallybased on the hardware playout clock), while intermediate sinks (like aPeerConnection) would like to get data from the stream as soon as it'savailable - and preferably would accept data as it's available - withthe capture timestamp being fed in to let the far-end handle sync.

If a source is blocked, the sink should decide whether to block itselfor not (either dynamically or statically when it becomes a sync). Forrealtime use, typically you want to use the latest video frame thatmatches (or is slightly newer than) the audio, but never block audiooutput on video frames being late or non-available.

If I had my druthers, I'd want an interface where on Pull you canspecify if you want to block on any missing data or not, and whenpushing video in you could specify a start time and duration, or specifya start time an no/infinite duration (which would work out to "untilanother frame is pushed"). Non-pulling sinks (such as PeerConnection)generally want any track data as soon as possible, even if other tracksdon't have data yet, and they want to get the data with a timestamp.GetUserMedia would push video frames with infinite/no durations, aswould PeerConnection sources. Streaming/recorded media may well pushvideo with defined durations. The last question is about audio.Pulling sources should generally block on missing audio (thoughPeerConnection sources should adapt off the pulls and never causeblocking, while realtime sampled sources should probably resample ortimebase correct to adapt to the Puller's frequency. (I.e. the Graphfrequency is driven by the output sink, and inputs need to adapt to it,at least if they're hooked together).

There's an assumption here that media elements Pull the data andthemselves are driven by the output clocking - or equivalent (it doesn'tactually have to be a Pull from the media; it could be clocked out(using the main output device clock) by the MediaStream into the mediaelement).

One side note: a MediaStream that goes from audio capture -> MediaStream-> PeerConnection doesn't have to be resampled to match the outputclock, though it may be simpler (talbeit more expensive) to do so. Ofcourse, if it's later cloned (the track or stream) and does go tooutput, it may need to start resampling/etc (though it could do so atthe cloning point I think). This may be a common usage with aself-image (a muted <video> element playing the same MediaStreamattached to the PeerConnection).

Those are my off-the-cuff thoughts (nothing seriously new here); I mayhave missed a point somewhere - please feel free to critique. (derf,jmspeex especially)


Side note:

In theory you could switch on resampling when the <video> elementun-muted (and only pull video frames until then), but that's gettingpretty complex. If you want to get really accurate, elements shouldknow if they're visible and only be synced to output if they're somehowrouted to audio outputs. (for example, a hidden video element beingused to capture a MediaStream for playback to a PeerConnection wouldn'thave to be synced to audio output - but one that's connected to avisible <video> element would have to be. It can get complex if youwant to do the maximal version of this, which I wouldn't advise -certainly not now.)

Note that in GIPS, video frames have times of arrival but no duration,
so there is a difficult match there as well.


Correct.


  -Ekr

2) Various devices implement stream capture using ring buffers and
therefore don't really want to give away references to image buffers that
can live indefinitely ... so these image buffers aren't a good fit for the
Image object, which allows Gecko code to keep an Image alive indefinitely
... unless we make copies of images, which of course we want to avoid. So
we'd really like a SourceMediaStream to be able to manage the lifetimes of
its frames, most of the time, and make frame copies (if necessary) only in
exceptional cases.

Let me know if there are important issues I've overlooked. And share your
ideas if you have a solution. I'm still thinking :-).


--
   Randell Jesup

_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media

Re: Delay and buffering problems at MediaStreamGraph during camera preview

Reply via email to