OK, some comments back on the cue range design. Sorry for the
summer-vacation-induced delay in response!
At 1:00 +0000 12/06/08, Ian Hickson wrote:
> In the current HTML5 draft cue ranges are available using a DOM API.
This way of doing ranges is less than ideal.
First of all, it is hard to use. The ranges must be added by script,
can't be supplied with the media, and the callbacks are awkward to
handle. The only way to identify the range a received callback applies
to is by creating not one but two separate functions for each range: one
for enter, one for exit. While creating functions on-demand is easy in
JavaScript it does fall under advanced techniques that most authors will
be unfamiliar with.
One of the features proposed for the next version of the video API is
chapter markers and other embedded timed metadata, with corresponding
callbacks for authors to hook into. Would that resolve the problem you
mention?
It may be that if we can define a way to embed cue-range-generating
meta-data in the media resource, with an abstract 'api' to get it
out, we'd deal with the "only add by script" issue here, yes. The
others, not so much.
Using elements makes ranges identifiable, traversable and modifiable
by using familiar APIs and concepts. However it is true that there
are other ways to get some of the same functionality. Unless the
elements have some non-scripting functionality (like linking) the
case is perhaps not totally compelling. Instantiating ranges from
custom markup using script is a possibility.
Overall, we remain concerned that typically it is the media author
who would define what the ranges are, not really the page or
particularly the script author. Media authors tend not to be happy
writing scripts.
This kind of feature is also not available in all languages that might
provide access to the DOM API.
JavaScript is really the only concern from HTML5's point of view; if other
languages become relevant, they should get specially-crafted APIs for
them when it comes to this kind of issue.
The problem is that the current API more or less requires use of
closures and currying except for trivial cases. We don't think that
is good API design even for languages that have them. Perhaps at the
very least a cookie could be passed?
Secondly this mechanism is not very powerful. You can't do anything else
with the ranges besides receiving callbacks and removing them. You can't
modify them. They are not visible to scripts or CSS. You can't link to
them. You can't link out from them.
I'm not sure what it would really mean to link to or from a range, unless
you turned the entire video into a link, in which case you can just wrap
the <video> in an <a href=""> element for the duration of the range, using
script.
Linking into a cue-range would be using its beginning or end as a
seek point, or its duration as a restricted view of the media ("only
show me cue-range called InTheBathroom"). Linking out of a cue-range
would be establishing a click-through URL that would be dispatched
directly if the user clicked on the media during that range
(dispatched without script). We agree that neither of these should
be in scope now, but it would be nice to have a framework that could
be extended to cover these, in future.
> Thirdly, a script is somewhat strange place to define the ranges. A set
of ranges usually relates closely to some particular piece of media
content. The same set of ranges rarely makes much sense in the context
of some other content. It seems that ranges should be defined or
supplied along with the media content.
For in-band data, callbacks for chapter markers as mentioned earlier seem
like the best solution.
For out-of-band data, if the ranges are just intended to trigger script, I
don't think we gain much from providing a way to mark up ranges semi-
declaratively as opposed to just having HTML-based media players define
their own range markup and have them implement it using this API. It
wouldn't be especially hard.
This seems to conflict with the answer (1) above, doesn't it?
> Fourth, this kind of callback API is pretty strange creature in the HTML
specification. The only other callback APIs are things like setTimeout()
and the new SQL API which don't have associated elements. Events are the
callback mechanism for everything else.
Events use callbacks themselves, so it's not that unusual.
I don't really think events would be a good interface for this.
Consistency is good, but if one can come up with a better API, it's better
to use that than just be consistent for the sake of it.
It does seem strange that events are right in the spatial domain
(mouse enter/exit), but not in the temporal domain. Yet the basic
semantic of the english word "event", let alone the web meaning, is
pretty well exactly matched by what is happening here -- crossing a
temporal boundary! Events are well-known and design uniformity
suggests that they be used, if nothing else.
> In SMIL the equivalent concept is the <area> element which is
used like this:
> <video src="http://www.example.org/CoolStuff">
<area id="area1" begin="0s" end="5s"/>
<area id="area2" begin="5s" end="10s"/>
</video>
This kind of approach has several advantages.
* Ranges are defined as part of the document, in the context of a particular
media stream.
I'm not sure why that is an advantage in the context of HTML.
Because it is declarative and 'close to' (or maybe later, even
within) the media resource.
> * This uses events, a more flexible and more appropriate callback
mechanism.
I don't really see why the flexibility of events is useful here, and I
don't see why it's more appropriate.
But we ask the opposite: why is it compelling not to fit into the normal way of
> * The callbacks have a JavaScript object associated with them, namely a DOM
element, which carries information about the range.
That's useful, yes. Should we include some data with the callback?
Yes, if we cannot agree on this proposal, then some sort of cookie or
ID should be associated with a cue range (a string name of the range,
for example).
We
could include the class name, the start time, and the end time. Having
said that, it's easy to use currying here to hook callbacks that know what
they're expecting.
Currying is pretty advanced; we're already concerned about using
scripting at all!
> We would like to suggest a <timerange> element that can be used as a
child of the <video> and <audio> elements.
It's not clear to me that this is solving any problems worth solving.
Well, we think we should first evaluate the two ways of doing this,
and then give weight, if appropriate, to the 'first written' way
(yours). We're technically still in WD so we should, if possible,
prefer the better solution.
Let's look at a few comparison axes:
Declarative or established by script? We prefer declarative, as we
think the most likely definers of what the cue-ranges are (as opposed
to how they are handled) are the media authors, not the page authors.
Events or callbacks? Since we see these as the temporal equivalent
of the spatial mouse events, we see events as the most natural
analog. They also have event identifiers, making it much easier to
have separate handlers for different ranges or events.
Provide a framework for talking about time-ranges for other purposes
such as linking in or out? Yes, annotated ranges like ours do
provide such a basis.
Makes the DTD and HTML5 spec. more complex? Yes, we agree that this
introduces another element into the spec., with all that implies.
* *
Here are some more general ideas (not all meshed together):
* stating that the abstract interface to a media resource includes
finding its 'cue ranges', and inserting them automatically, and the
definers of a media resource type (e.g. MPEG for MP4) can define
something like "property X maps to HTML5 cue ranges in the following
way" would be OK. But I think again, then, that they have to be
annotational, so that they can have an ID and make an event....
* adding a cookie/rangeID to the current API would help...
* adding an attribute to <source> called "annotations" which could
point at a variety of types, including at an XML file (to be defined)
which contains meta-data, cue-range definitions etc., as if they were
part of the media, would help move this out of the HTML5 but still
provide a uniform interface...
example
<source src="myMovie.mp4" annotations="myMovie-tags.xml" />
then if the annotations should be got from the media resource itself,
the notation
<source src="myMovie.mp4" annotations="myMovie.mp4" />
could be used, and
<source src="myMovie.mp4" />
would be equivalent.
we could even use
<source src="myMovie.mp4" annotations="" />
to explicitly defeat the retrieval of annotations.
(Such an "annotations" href might also help with associating metadata
with media resources, particularly when the same metadata should be
associated with a set of sources that differ in bitrate, codec, etc.).
--
David Singer
Apple/QuickTime