On 3/10/2015 5:50 AM, Robert O'Callahan wrote:
On Tue, Mar 10, 2015 at 8:29 PM, Randell Jesup <[email protected]
<mailto:[email protected]>> wrote:
What there isn't currently is a vision for how video processing
will occur, and how we'll deal with some of the cases mentioned by
roc. It can be done very painfully once we land Canvas->MS input,
by doing MS -> video_element -> Canvas (and process) -> MS. This
isn't great, and doesn't help with things like synchronization,
etc. (And maybe Canvas will be part of the final solution space,
but we don't know yet.)
I'd love to know where we want to go for with video processing and
synchronization. That may be a orthogonal issue; if so, great.
I'm not sure it's *all* orthogonal, though.
Under the FoxEye project there's a proposal for Worker-based video
processing. It doesn't require any kind of MSG blocking though.
Good.
- Microphone are going to come in sync with audio in the
rather near
future (once we do full-duplex audio streams), and the graph
is driven
by the audio stream, so it cannot under-run.
This assumes that we have one output and that all sources are
synced to that output, or that streams never cross output time
domain boundaries. I'm not sure this will be the case moving
forward, especially as we start to enable output selection and
multiple outputs.
I hope we can avoid using blocking to handle this. We can time-stretch
audio instead.
time-stretching as is done in WebRTC is expensive and involves analyzing
the audio for dominant frequencies, and inserting/removing multiples of
that. Resampling at a time-domain boundary is the right way to do it
(and less expensive; doubly so if they happen to be synced in reality
and you can optimize for that case).
Additionally, blocking is (part of) what makes the MSG
complicated to
read and reason about: for example, getting the current time for a
stream is a non-straightforward operation where it should be a
subtraction.
Pragmatically, I think we should look into removing "blocking"
from the
MSG, to move toward a piece of code that does one thing fast,
easier to
read.
I admit I'd love to remove blocking, or simplify it. I do want to
make sure we don't turn around and make things that much or more
harder for ourselves elsewhere or in the future, or force a bunch
of boilerplate code to exist in lots of places where it can be
gotten wrong. (see some of Derf's comments). Perhaps some
standardized helper class would simplify the job of supporting
this in sources/sinks.
Can we flesh out a bit more of exactly what removing it would
result in? Especially for the cases Benjamin/etc brought up? How
much would multiple outputs (and multiple inputs) and streams that
bridge output (clock) domains complicate things?
Can we save MSE videos without reencoding? I would hope so, in which
case MSG is probably not involved.
I would hope saving videos was/is done without re-encoding. Not sure
how MSE affects it, but it shouldn't.
Looking again at my original message, use-case class #1 can be handled
by labeling dead-time video/audio as such. Can someone go on record as
saying classes #2 and #3 aren't worth worrying about?
Paul? Are there alternatives for #2? I don't think there's any
alternative to #3. Being able to build NLE editors for video would be
cool (and feed into the Foxeye stuff I imagine). But not mandatory.
Randell
_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media