First some background, the basic architecture of Firefox's video decoder is described here:

http://blog.pearce.org.nz/2011/02/firefox-4-video-decoder-architecture.html

This is a bit out of date now, the "ns" prefixes have been removed, and the "nsBuiltin" prefix has been replaced with "Media", i.e. MediaDecoder, MediaDecoderStateMachine. You should read that blog post to get a feel for the general neighbourhood you're working in.



The bits relevant to the "should we inherit from MediaDecoder?" question are:

1. Each nsHTMLMediaElement has one MediaDecoder for decoding the video
   file.
2. The MediaDecoder manages the high level state for downloading a
   playing back the video file.
3. Each MediaDecoder has a MediaDecoderStateMachine to manage the low
   level state for downloading and playing the video file. This has a
   thread for decoding video and audio and has complicated logic for
   controlling buffering, thread lifetime, ensuring video and audio
   decode are kept in time, managing the queues of decoded video and
   audio samples and ensuring they get sent to the rendering pipelines
   on time. That's a complicated class, you don't want to mess with
   this if you don't need to!

I don't think that we need to have a WebVTTDecoder class that inherits from MediaDecoder. We don't need the a WebVTT decoding object that has a MediaDecoderStateMachine; we don't need all the logic for managing audio/video frame decoding, WebVTT doesn't have video and audio samples!



Regarding the "what should we do now?" question:

I've had a brief look at "Ralph's work in progress dump" patch in bug 629350, and it seems to be a pretty good start.

The W.I.P. patch is already downloading the webvtt file using an nsIChannel. This is good. The data is delivered to you incrementally in chunks in the callback HTMLTrackElement::LoadListener::OnDataAvailable(). In that function you should pass the data off to the incremental webvtt parser, and then when cues are ready you can then construct the TextTrackCue objects and attach them to their owning TextTrack by calling the C++ implementation of TextTrack.addCue().

nsHTMLMediaElement::FireTimeUpdate(bool) then needs to use the TextTrackList API to query whether it should change the cue displayed on the screen.

I think that you should create a WebVTTParser object that manages the webvtt_parser_t, and manage the parsing from there. This can be owned (i.e. created, destroyed, and the owning reference/pointer held) by the HTMLTrackElement. Even better, we can just turn the existing HTMLTrackElement::LoadListener into this, as then we're right on the receiving end of the incoming unparsed data.

So I'd recommend your list of things to do is this, building on top of Ralph's patch:

1. Rename HTMLTrackElement::LoadListener to WebVTTParser, and split it
   out into its own file. Or have the LoadListener forward
   OnDataAvailable/OnStopRequest calls to the WebVTTParser.
2. Change HTMLTrackElement to store a nsRefPtr<WebVTTParser> reference
   to the parser that you create in HTMLTrackElement::LoadResource().
3. WebVTTParser::OnDataAvailable() is currently creating and destroying
   a new webvtt_parser_t on every call. OnDataAvailable is called
   multiple times, everytime we have a new chunk of the file
   downloaded. So we should instead create the webvtt_parser_t once per
   WebVTTParser, say in the constructor, or in a new Init() method.
4. Change WebVTTParser::OnDataAvailable() to use the parser created
   from step 3 incrementally and parse the chunk of data that was just
   downloaded.
5. Extract the cue from the webvtt_parser_t (they're reported in a
   callback right?) and use the TextTrack.addCue() API to add them to
   the appropriate TextTrack object.
6. Change nsHTMLMediaElement::FireTimeupdate() to query the
   TextTrackList and update the cue being displayed on the video frame
   in a timely fashion.

Once you've got that working, you then need to check the spec [1] and ensure the right text tracks are being loaded. It looks to me like we just load text tracks for all <track> elements which have a src attribute that are added to a document, we're only supposed to load the <track> elements with a "default" attribute?

Ralph, does that plan sound reasonable?

That should keep you guys busy for a while. If you've got any questions, please don't hesitate to ask here or on IRC.


Cheers,
Chris Pearce.

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#the-track-element



On 25/01/2013 12:42 p.m., Rick Eyre wrote:
Hey all,
Myself and Shayan (reyre and ShayanZafar on IRC) will soon be beginning the 
implementation of the TextTrackDecoder for WEBVTT. I've noticed that on the 
first bug that Ralph filed there was some deliberation on the way that this 
should be done. Chris mentioned this in one of his comments 
(https://bugzilla.mozilla.org/show_bug.cgi?id=629350#c29). Due to this we're 
not really sure which approach is the best way. I was wondering if Chris or 
Ralph, or anyone, could give us a high level view of what needs to be done in 
order to accomplish this as we are new to the code and are still struggling to 
learn it.
Any help would be much appreciated.
Thank you for your time and patience,Rick                                       

_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media

Reply via email to