First some background, the basic architecture of Firefox's video decoder
is described here:
http://blog.pearce.org.nz/2011/02/firefox-4-video-decoder-architecture.html
This is a bit out of date now, the "ns" prefixes have been removed, and
the "nsBuiltin" prefix has been replaced with "Media", i.e.
MediaDecoder, MediaDecoderStateMachine. You should read that blog post
to get a feel for the general neighbourhood you're working in.
The bits relevant to the "should we inherit from MediaDecoder?" question
are:
1. Each nsHTMLMediaElement has one MediaDecoder for decoding the video
file.
2. The MediaDecoder manages the high level state for downloading a
playing back the video file.
3. Each MediaDecoder has a MediaDecoderStateMachine to manage the low
level state for downloading and playing the video file. This has a
thread for decoding video and audio and has complicated logic for
controlling buffering, thread lifetime, ensuring video and audio
decode are kept in time, managing the queues of decoded video and
audio samples and ensuring they get sent to the rendering pipelines
on time. That's a complicated class, you don't want to mess with
this if you don't need to!
I don't think that we need to have a WebVTTDecoder class that inherits
from MediaDecoder. We don't need the a WebVTT decoding object that has a
MediaDecoderStateMachine; we don't need all the logic for managing
audio/video frame decoding, WebVTT doesn't have video and audio samples!
Regarding the "what should we do now?" question:
I've had a brief look at "Ralph's work in progress dump" patch in bug
629350, and it seems to be a pretty good start.
The W.I.P. patch is already downloading the webvtt file using an
nsIChannel. This is good. The data is delivered to you incrementally in
chunks in the callback
HTMLTrackElement::LoadListener::OnDataAvailable(). In that function you
should pass the data off to the incremental webvtt parser, and then when
cues are ready you can then construct the TextTrackCue objects and
attach them to their owning TextTrack by calling the C++ implementation
of TextTrack.addCue().
nsHTMLMediaElement::FireTimeUpdate(bool) then needs to use the
TextTrackList API to query whether it should change the cue displayed on
the screen.
I think that you should create a WebVTTParser object that manages the
webvtt_parser_t, and manage the parsing from there. This can be owned
(i.e. created, destroyed, and the owning reference/pointer held) by the
HTMLTrackElement. Even better, we can just turn the existing
HTMLTrackElement::LoadListener into this, as then we're right on the
receiving end of the incoming unparsed data.
So I'd recommend your list of things to do is this, building on top of
Ralph's patch:
1. Rename HTMLTrackElement::LoadListener to WebVTTParser, and split it
out into its own file. Or have the LoadListener forward
OnDataAvailable/OnStopRequest calls to the WebVTTParser.
2. Change HTMLTrackElement to store a nsRefPtr<WebVTTParser> reference
to the parser that you create in HTMLTrackElement::LoadResource().
3. WebVTTParser::OnDataAvailable() is currently creating and destroying
a new webvtt_parser_t on every call. OnDataAvailable is called
multiple times, everytime we have a new chunk of the file
downloaded. So we should instead create the webvtt_parser_t once per
WebVTTParser, say in the constructor, or in a new Init() method.
4. Change WebVTTParser::OnDataAvailable() to use the parser created
from step 3 incrementally and parse the chunk of data that was just
downloaded.
5. Extract the cue from the webvtt_parser_t (they're reported in a
callback right?) and use the TextTrack.addCue() API to add them to
the appropriate TextTrack object.
6. Change nsHTMLMediaElement::FireTimeupdate() to query the
TextTrackList and update the cue being displayed on the video frame
in a timely fashion.
Once you've got that working, you then need to check the spec [1] and
ensure the right text tracks are being loaded. It looks to me like we
just load text tracks for all <track> elements which have a src
attribute that are added to a document, we're only supposed to load the
<track> elements with a "default" attribute?
Ralph, does that plan sound reasonable?
That should keep you guys busy for a while. If you've got any questions,
please don't hesitate to ask here or on IRC.
Cheers,
Chris Pearce.
[1]
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#the-track-element
On 25/01/2013 12:42 p.m., Rick Eyre wrote:
Hey all,
Myself and Shayan (reyre and ShayanZafar on IRC) will soon be beginning the
implementation of the TextTrackDecoder for WEBVTT. I've noticed that on the
first bug that Ralph filed there was some deliberation on the way that this
should be done. Chris mentioned this in one of his comments
(https://bugzilla.mozilla.org/show_bug.cgi?id=629350#c29). Due to this we're
not really sure which approach is the best way. I was wondering if Chris or
Ralph, or anyone, could give us a high level view of what needs to be done in
order to accomplish this as we are new to the code and are still struggling to
learn it.
Any help would be much appreciated.
Thank you for your time and patience,Rick
_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media