Re: Implementing TextTrackDecoder

Chris Pearce Thu, 24 Jan 2013 19:07:02 -0800

First some background, the basic architecture of Firefox's video decoderis described here:


http://blog.pearce.org.nz/2011/02/firefox-4-video-decoder-architecture.html

This is a bit out of date now, the "ns" prefixes have been removed, andthe "nsBuiltin" prefix has been replaced with "Media", i.e.MediaDecoder, MediaDecoderStateMachine. You should read that blog postto get a feel for the general neighbourhood you're working in.

The bits relevant to the "should we inherit from MediaDecoder?" questionare:


1. Each nsHTMLMediaElement has one MediaDecoder for decoding the video
   file.
2. The MediaDecoder manages the high level state for downloading a
   playing back the video file.
3. Each MediaDecoder has a MediaDecoderStateMachine to manage the low
   level state for downloading and playing the video file. This has a
   thread for decoding video and audio and has complicated logic for
   controlling buffering, thread lifetime, ensuring video and audio
   decode are kept in time, managing the queues of decoded video and
   audio samples and ensuring they get sent to the rendering pipelines
   on time. That's a complicated class, you don't want to mess with
   this if you don't need to!

I don't think that we need to have a WebVTTDecoder class that inheritsfrom MediaDecoder. We don't need the a WebVTT decoding object that has aMediaDecoderStateMachine; we don't need all the logic for managingaudio/video frame decoding, WebVTT doesn't have video and audio samples!




Regarding the "what should we do now?" question:

I've had a brief look at "Ralph's work in progress dump" patch in bug629350, and it seems to be a pretty good start.

The W.I.P. patch is already downloading the webvtt file using annsIChannel. This is good. The data is delivered to you incrementally inchunks in the callbackHTMLTrackElement::LoadListener::OnDataAvailable(). In that function youshould pass the data off to the incremental webvtt parser, and then whencues are ready you can then construct the TextTrackCue objects andattach them to their owning TextTrack by calling the C++ implementationof TextTrack.addCue().

nsHTMLMediaElement::FireTimeUpdate(bool) then needs to use theTextTrackList API to query whether it should change the cue displayed onthe screen.

I think that you should create a WebVTTParser object that manages thewebvtt_parser_t, and manage the parsing from there. This can be owned(i.e. created, destroyed, and the owning reference/pointer held) by theHTMLTrackElement. Even better, we can just turn the existingHTMLTrackElement::LoadListener into this, as then we're right on thereceiving end of the incoming unparsed data.

So I'd recommend your list of things to do is this, building on top ofRalph's patch:


1. Rename HTMLTrackElement::LoadListener to WebVTTParser, and split it
   out into its own file. Or have the LoadListener forward
   OnDataAvailable/OnStopRequest calls to the WebVTTParser.
2. Change HTMLTrackElement to store a nsRefPtr<WebVTTParser> reference
   to the parser that you create in HTMLTrackElement::LoadResource().
3. WebVTTParser::OnDataAvailable() is currently creating and destroying
   a new webvtt_parser_t on every call. OnDataAvailable is called
   multiple times, everytime we have a new chunk of the file
   downloaded. So we should instead create the webvtt_parser_t once per
   WebVTTParser, say in the constructor, or in a new Init() method.
4. Change WebVTTParser::OnDataAvailable() to use the parser created
   from step 3 incrementally and parse the chunk of data that was just
   downloaded.
5. Extract the cue from the webvtt_parser_t (they're reported in a
   callback right?) and use the TextTrack.addCue() API to add them to
   the appropriate TextTrack object.
6. Change nsHTMLMediaElement::FireTimeupdate() to query the
   TextTrackList and update the cue being displayed on the video frame
   in a timely fashion.

Once you've got that working, you then need to check the spec [1] andensure the right text tracks are being loaded. It looks to me like wejust load text tracks for all <track> elements which have a srcattribute that are added to a document, we're only supposed to load the<track> elements with a "default" attribute?


Ralph, does that plan sound reasonable?

That should keep you guys busy for a while. If you've got any questions,please don't hesitate to ask here or on IRC.



Cheers,
Chris Pearce.

[1]http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#the-track-element




On 25/01/2013 12:42 p.m., Rick Eyre wrote:

Hey all,
Myself and Shayan (reyre and ShayanZafar on IRC) will soon be beginning the 
implementation of the TextTrackDecoder for WEBVTT. I've noticed that on the 
first bug that Ralph filed there was some deliberation on the way that this 
should be done. Chris mentioned this in one of his comments 
(https://bugzilla.mozilla.org/show_bug.cgi?id=629350#c29). Due to this we're 
not really sure which approach is the best way. I was wondering if Chris or 
Ralph, or anyone, could give us a high level view of what needs to be done in 
order to accomplish this as we are new to the code and are still struggling to 
learn it.
Any help would be much appreciated.
Thank you for your time and patience,Rick


_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media

Re: Implementing TextTrackDecoder

Reply via email to