On 3/3/2015 3:19 PM, Paul Adenot wrote:
On Tue, Mar 3, 2015, at 03:18 AM, Robert O'Callahan wrote:
Are these are features we care about, or will care about? Can we address
them in simpler/better ways than with the MediaStream blocking concept?
Or
should we just ignore them, take out the blocking support, and add it
back
in if/when we need to?
I think those use cases represent a small subset of what the MSG is used
for, or could be used for (in the future). There seem to be a gap
between what the MSG was designed for, and what we are using it for
today.
That's clearly true, so per roc, what are implications...
We effectively use the MSG as a generic routing mechanism for real-time
media streams. It seems important for Gecko to perform very well at
taking real-time input data, processing it, and outputing it, as fast as
possible (as in, wall clock time from input to output, and overall CPU
time).
Yes. There's less in-graph processing than was originally envisioned;
that's primarily happening in WebAudio graphs now - and that often will
involve MS -> WebAudio -> MS (or MS -> WebAudio -> media_element).
What there isn't currently is a vision for how video processing will
occur, and how we'll deal with some of the cases mentioned by roc. It
can be done very painfully once we land Canvas->MS input, by doing MS ->
video_element -> Canvas (and process) -> MS. This isn't great, and
doesn't help with things like synchronization, etc. (And maybe Canvas
will be part of the final solution space, but we don't know yet.)
I'd love to know where we want to go for with video processing and
synchronization. That may be a orthogonal issue; if so, great. I'm not
sure it's *all* orthogonal, though.
In this context, input streams are, in real life, very unlikely to be
unable to deliver data, and not delivering data is certainly not the
common case, so it should not be something that is expensive to compute
(or computed at all). I think it can be handled just outside the graph
without much problems and less complexity:
(Audio) Sources will be one of: realtime (mic, RTCPeerConnection,
WebAudio), or non-realtime (streaming, perhaps some
dynamically-generated data). Note that realtime sources *shouldn't*
underrun, but they can. Also note that mic sources may involve a time
domain boundary and thus resampling (and in some odd cases,
transitioning data between differently-clocked subgraphs could cause
"realtime" sources to face underrun, if they aren't resampled.
- Microphone are going to come in sync with audio in the rather near
future (once we do full-duplex audio streams), and the graph is driven
by the audio stream, so it cannot under-run.
This assumes that we have one output and that all sources are synced to
that output, or that streams never cross output time domain boundaries.
I'm not sure this will be the case moving forward, especially as we
start to enable output selection and multiple outputs.
- HTMLMediaElement that are paused or are buffering should just insert
under-run frames: frames that are rendered as silence when rendering is
needed, but can be detected as not being part of the media. For video,
we just keep the last frame.
Could they be inserted and flagged as underrun frames, allowing more
easy processing for things like streaming-video-capture (Benjamin's
case)? Sounds like from the next point this was assumed.
- Sinks that receive under-run frames decide what they want to do with
them (for an AudioDestinationNode, that would be "play silence", for a
MediaRecorder, that would be "ignore"). If there has been Web Audio API
processing in between the source and the sink, this under-run
information is lost, and I believe there is little that can be done.
- PeerConnections can stretch audio and do all sorts of magic if audio
packets are dropped, worst case, they insert just enough silence frame.
We don't want to add under-run frames here, normal silence frames should
do, I think.
PeerConnections as an output would never underrun unless the entire
graph underruns. PeerConnections as a sink would convert underruns
into silence.
Additionally, blocking is (part of) what makes the MSG complicated to
read and reason about: for example, getting the current time for a
stream is a non-straightforward operation where it should be a
subtraction.
Pragmatically, I think we should look into removing "blocking" from the
MSG, to move toward a piece of code that does one thing fast, easier to
read.
I admit I'd love to remove blocking, or simplify it. I do want to make
sure we don't turn around and make things that much or more harder for
ourselves elsewhere or in the future, or force a bunch of boilerplate
code to exist in lots of places where it can be gotten wrong. (see some
of Derf's comments). Perhaps some standardized helper class would
simplify the job of supporting this in sources/sinks.
Can we flesh out a bit more of exactly what removing it would result
in? Especially for the cases Benjamin/etc brought up? How much would
multiple outputs (and multiple inputs) and streams that bridge output
(clock) domains complicate things?
--
Randell Jesup, Mozilla
_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media