On 3/10/2015 5:50 AM, Robert O'Callahan wrote:
On Tue, Mar 10, 2015 at 8:29 PM, Randell Jesup <[email protected] <mailto:[email protected]>> wrote:

    What there isn't currently is a vision for how video processing
    will occur, and how we'll deal with some of the cases mentioned by
    roc. It can be done very painfully once we land Canvas->MS input,
    by doing MS -> video_element -> Canvas (and process) -> MS. This
    isn't great, and doesn't help with things like synchronization,
    etc.  (And maybe Canvas will be part of the final solution space,
    but we don't know yet.)

    I'd love to know where we want to go for with video processing and
synchronization. That may be a orthogonal issue; if so, great. I'm not sure it's *all* orthogonal, though.


Under the FoxEye project there's a proposal for Worker-based video processing. It doesn't require any kind of MSG blocking though.

Good.


        - Microphone are going to come in sync with audio in the
        rather near
        future (once we do full-duplex audio streams), and the graph
        is driven
        by the audio stream, so it cannot under-run.


    This assumes that we have one output and that all sources are
    synced to that output, or that streams never cross output time
    domain boundaries.  I'm not sure this will be the case moving
    forward, especially as we start to enable output selection and
    multiple outputs.


I hope we can avoid using blocking to handle this. We can time-stretch audio instead.

time-stretching as is done in WebRTC is expensive and involves analyzing the audio for dominant frequencies, and inserting/removing multiples of that. Resampling at a time-domain boundary is the right way to do it (and less expensive; doubly so if they happen to be synced in reality and you can optimize for that case).


        Additionally, blocking is (part of) what makes the MSG
        complicated to
        read and reason about: for example, getting the current time for a
        stream is a non-straightforward operation where it should be a
        subtraction.

        Pragmatically, I think we should look into removing "blocking"
        from the
        MSG, to move toward a piece of code that does one thing fast,
        easier to
        read.


    I admit I'd love to remove blocking, or simplify it.  I do want to
    make sure we don't turn around and make things that much or more
    harder for ourselves elsewhere or in the future, or force a bunch
    of boilerplate code to exist in lots of places where it can be
    gotten wrong.  (see some of Derf's comments).  Perhaps some
    standardized helper class would simplify the job of supporting
    this in sources/sinks.

    Can we flesh out a bit more of exactly what removing it would
    result in?  Especially for the cases Benjamin/etc brought up?  How
    much would multiple outputs (and multiple inputs) and streams that
    bridge output (clock) domains complicate things?


Can we save MSE videos without reencoding? I would hope so, in which case MSG is probably not involved.

I would hope saving videos was/is done without re-encoding. Not sure how MSE affects it, but it shouldn't.


Looking again at my original message, use-case class #1 can be handled by labeling dead-time video/audio as such. Can someone go on record as saying classes #2 and #3 aren't worth worrying about?

Paul? Are there alternatives for #2? I don't think there's any alternative to #3. Being able to build NLE editors for video would be cool (and feed into the Foxeye stuff I imagine). But not mandatory.

   Randell
_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media

Reply via email to