RE: WebVTT Parser Standards Compliance

Caitlin Potter Tue, 05 Feb 2013 10:08:02 -0800

parse
/pärs/
Verb
Analyze (a sentence) into its component parts and describe their syntactic 
roles.

Yes, this definition is referring to a human language, but it applies to 
computer languages in just the same way. A parser depends on grammar and 
syntax. It can not parse without an understanding of it. If you can't figure 
out which part of a sentence is a pronoun, subject, object or verb, you're 
going to have a pretty difficult time understanding what I'm saying when I'm 
talking to you. Admittedly, you're probably having a difficult time with that 
anyways. But yes, the syntax and grammar of a language is essentially the 
language itself.

"The syntax rules have nothing to do with parsing"

This is incorrect. The syntax/grammar of the input tells us exactly how to 
parse webvtt input. The only thing it does not tell us is what extra rules 
(beyond those of the grammar) to apply.

It's very simple: the syntax/grammar gives us the mathematical model explaining 
how to process the input, whereas the parsing section gives a (very poorly 
thought out, naive) algorithm with some additional rules that are not found in 
the syntax/grammar. The algorithm outlined in this section is based on the 
syntax/grammar as well.

The only difference between the two are these extra rules, which as I've 
explained at least a dozen times now, fit quite well into the code we have now.

We don't want to ignore those parsing rules, when we need to follow them 
(which, despite what you might believe, is not "all the time"). We are fully 
aware that there is an expected output, and we don't want to output things that 
other browsers wouldn't (unless the user asks us to).

Stop making these ridiculous fallacious statements claiming that we are 
ignoring the draft, it is and has been our goal to create a library for 
operating on this format in a standards-compliant manner, and that is exactly 
what we aim to do.

That does not mean eliminating the possibility of using the library for 
different use cases. If we only care about this single use-case, then there is 
no point in writing a 3rd party library at all, because we could simply 
implement it directly in gecko.

Other applications will have different uses for the library, and even the 
browser will want to use it in different ways, at different times, under 
different circumstances. That's generally just the way it is. We're writing 
code for something that is used as a media player, a development/testing 
platform. It's going to need a good deal more functionality than that which is 
outlined in the "parsing" section of the draft.

For this reason, we are concerned with the draft in its entirety. Not merely 
this one section that you are obsessed with. Yes, it is important, yes, we need 
to ensure that we can generate output that is compliant with it. But no, it is 
not the sole area of the draft that we need to look at in order to create a 
useful product. We are concerned with the entire thing, from syntax and grammar 
to rendering. There is no room for ignoring a section of it just because you 
feel the parsing algorithm they outline is more important.

What is important is that we eliminate bugs, but the parsing spec is not the 
only part of the document that tells us what constitutes valid input or not. 
(In fact, valid input is defined in the syntax/grammar, which is what makes it 
valid in the first place).
________________________________
From: Kyle Barnhart [[email protected]]
Sent: Tuesday, February 05, 2013 12:37 PM
To: Caitlin Potter
Cc: [email protected]
Subject: Re: WebVTT Parser Standards Compliance

Again.

"Those extra constraints are great and all (although, if we're honest, they're 
really not very well thought out), but a parsing algorithm is defined by the 
underlying grammar"
No. This is been made abundantly clear. The syntax rules have nothing to do 
with parsing. So all other line of thought based on that faulty premise are 
flawed.

On Tue, Feb 5, 2013 at 8:07 AM, Caitlin Potter 
<[email protected]<mailto:[email protected]>> wrote:
You (once again) misunderstand what I've said, Kyle. I don't know how to put it 
in any simpler terms.

The "parsing" section of the specification (that is just ONE single aspect of a 
very long draft, which relates to an algorithm used when displaying track 
elements in the browser)

The only important thing to do with that parsing algorithm is to ensure that we 
can output the same thing.

Those extra constraints are great and all (although, if we're honest, they're 
really not very well thought out), but a parsing algorithm is defined by the 
underlying grammar. The "parsing" section contains some rules, which are easily 
followed. But forcing the library to follow them all the time means removing 
some other use cases for the library, and that's frankly just silly. And we 
can't realistically expect the algorithm they lay out to work in a browser, 
anyway. The other implementations we've looked at certainly don't.

The "parsing section" lays out some rules to follow, so we follow those rules 
(optionally, where possible). If you think we should implement a second parser 
for rendering only, that follows the algorithm outlined in the parsing section 
to the letter, then feel free to contribute code. But I think it will be 
somewhat difficult to implement that in a re-entrant fashion with small blocks 
of data at a time, if it's a large file.

________________________________
From: Kyle Barnhart [[email protected]<mailto:[email protected]>]
Sent: Tuesday, February 05, 2013 12:24 AM
To: Caitlin Potter

Cc: 
[email protected]<mailto:[email protected]>
Subject: Re: WebVTT Parser Standards Compliance

Your first point about standards has already been resolved. There is only one 
standard by which a WebVTT parser may be judged, and that is the parser rules 
in the specifications. Any cues the parser outputs which does not meet the 
standard set in the parsing rules of the specification is by definition 
non-standards compliant. (see posts by Robert O'Callahan, L. David Baron, and 
quotes by Glenn Maynard, Simon Pieters, Velmont, Ian Hickson, and Ms2ger)

Your statement about flexibility is a good one. Usually flexibility is a good 
thing, but when implementing a standard it is actually very bad. I only touched 
on this briefly, so let me explain more fully.

The purpose of a standard (such as WebVTT) is to ensure consistent behavior 
wherever the standardized format is used. This is done so that developers using 
the format only need to write one version of the object and can expect it to 
work the same wherever the format is used. This way they do not need to make 
multiple versions of the object for each setting, nor do they need to write 
complex code to make sure it behaves the same across settings.

Take how HTML renders in browsers for an example. Developers often have to 
write complex code so that their website displays correctly in different 
browsers such as IE6. Fortunately this has improved over the last few years.

Another example is an image format. If implementations allowed for flexibility 
where deviation from the standard was allowed, then whoever made the image 
could not expect it to display the same in different programs. In some programs 
it may not display at all. This is why image formats have such standards, and 
is the same reason we cannot deviate from the specifications for WebVTT.

This is why I say there should only be one possible output of cues from the 
parser, which is standards compliant.

Thank you,
Kyle Barnhart

Helpful Links:
http://www.w3.org/standards/about.html
http://www.webstandards.org/learn/faq/

On Mon, Feb 4, 2013 at 11:18 PM, Caitlin Potter 
<[email protected]<mailto:[email protected]>> wrote:
The parser does (or, lets say, "should") output standards compliant cues. 
Everything is there: cue ID, start time, end time, the cue settings listed in 
the draft, and a tree of markup elements from the cue text.

The thing that is "not standards-compliant" with regard to the unit tests, is 
that the test FileParser implementation does not bother to implement higher 
level rules laid out in the parser section of the spec. This is because we want 
it to catch as many syntax errors as possible for the input, and read in as 
much data as possible for the input.

This does not mean that the spec is being "ignored", what this means is that 
this code is designed to be flexible, so that it can be used for different 
applications with different needs.

As you've seen on the github repository, I've created some issues that should 
help us improve how easy it is to obtain the behaviour laid out in the parsing 
section. We can likely have that done by the next release, if we decide to do 
it in the manner that I've proposed.

Flexible does not mean "ignore the standard". Nobody on this project has 
suggested that we ignore or abandon the specification. Not a single person. 
Flexible means, give clients of the library the power to operate in ways that 
make sense for the usage they want. The spec doesn't instruct us to return 
syntax errors, or to be able to parse not-quite-valid input. But because it's 
not expensive to do these things, and some applications will want to sometimes, 
then it just doesn't make sense to leave those things out.
________________________________
From: Kyle Barnhart [[email protected]<mailto:[email protected]>]
Sent: Monday, February 04, 2013 10:54 PM
To: Caitlin Potter
Cc: Ralph Giles; 
[email protected]<mailto:[email protected]>

Subject: Re: WebVTT Parser Standards Compliance

I have already agreed that I need to break up the patches and be more clear.

But that has nothing to do with whether the parser should be allowed to output 
non-standard compliant cues. Nor does it have anything to do with if there 
should be one set of tests or two.

On Mon, Feb 4, 2013 at 10:46 PM, Caitlin Potter 
<[email protected]<mailto:[email protected]>> wrote:
We agree that there are errors in these tests (however I did not see that one 
mentioned in your list of changes, which is a good reason to make much smaller, 
more focused issues rather than massive pull requests)

But your changes have done a lot more than just this. You've removed tests 
claiming that they're duplicates without demonstrating that they're duplicates 
(eg, claiming that "4 digits" and "greater than 999" are the same thing, with 
the same code to step through, when they are in fact not). But even if you do 
demonstrate that the code is the same, for things like that I think it's a good 
idea to keep those tests in place for now, because they're not actually the 
same thing.

Removing tests, changing the expected number of cues (sometimes correct, other 
times not so much), removing the comments regarding the syntax rules (which are 
essentially a guide through the parser code) and replacing them with the 
"relevant" section from the parser spec... Things like this, I don't agree 
with. There are a huge number of changed files in your pull requests, I haven't 
been over all of them, but a few of these things have stood out.

It will be a lot simpler if we can avoid these massive patches (I'm guilty of 
this too) so that it's easier for other people to provide an input on exactly 
what is correct and what isn't.

But nevermind the "big patch" issue for now, the thing is that we all agree 
that tests for the rules outlined in the "parsing" section are needed. The 
issue is that some of us believe these tests are different from the other tests 
(in that they will require a different FileParser implementation). They can 
still use the same data files, and even sit in the same folder. But some code 
changes are really necessary to make them test the things you want them to test.
________________________________________
From: 
dev-media-bounces+caitlin.potter=senecacollege...@lists.mozilla.org<mailto:[email protected]>

[dev-media-bounces+caitlin.potter=senecacollege...@lists.mozilla.org<mailto:[email protected]>]
 on behalf of Kyle Barnhart [[email protected]<mailto:[email protected]>]
Sent: Monday, February 04, 2013 10:21 PM
To: Ralph Giles
Cc: 
[email protected]<mailto:[email protected]>
Subject: Re: WebVTT Parser Standards Compliance

There are tests for every valid and invalid input we could think of. Let me
show an example of the changes I've made.

00:01.780 --> 00:02.300

That is a timing statement for a cue. The whitespace between the timestamp
and the arrow is required by the syntax rules but not required by the
parsing rules. So for tests where the whitespace is missing, I've changed
the expected number of cues from 0 to 1.

00:02.0005

The milliseconds are only allowed to have 3 digits. In the current tests
change it 00:02.005. This is not allowed by either syntax or parsing rules,
and by parsing rules the cue should be discarded. So I changed expected
cues from 1 to 0.

These are only two of the many changes that had to be made to make the
tests correct according to the parsing rules. The other big change I made
is to reference the parsing rules being tested in the comments instead of
the syntax rules, since the syntax rules don't apply to a parser and the
parsing rules do. Otherwise I've made no changes to the intent of any test,
and I have added many missing tests, and removed duplicate test. I have not
removed any debug error checks and have added many missing ones. In all the
modified tests are more thorough and make sure the parser is correctly
outputs cues that are standards compliant.

What Caitlin is now arguing is that the parsing library should have two
settings, one outputs non-standard cues, and one outputs standard cues, and
there is a set of tests for each. However I can see no possible reason ever
to output non-standard cues. In fact is it bad, dangerous, and a whole lot
of unnecessary work. The purpose of a standard is to make sure WebVTT
behaves the same in all settings. Outputting non-compliant cues directly
violates the standard. Allowing it can only serve to make it more difficult
to work with WebVTT, and developers will not know how their file will
behave from one application to the next. Therefore it is far less work and
far better to have one set of standard compliant tests, and fix the parser
to those standards. And it is better to do it now, then to go back and
re-engineer the thing later.

Thanks,
Kyle Barnhart

On Mon, Feb 4, 2013 at 8:53 PM, Ralph Giles 
<[email protected]<mailto:[email protected]>> wrote:

> On 13-02-04 5:20 PM, Caitlin Potter wrote:
>
> > The issue here is that Kyle insists on rewriting unit tests that are
> concerned with the "syntax specification", rather than adding new tests
> that are concerned with the "parser specification".
>
> Like Chris, I'm a confused what the contended issue is. To be clear, are
> we looking at the version of the webvtt spec at
> https://dev.w3.org/html5/webvtt/ ?
>
> This has several sections which seem relevant:
>
>  * 3.1 "Syntax" described the WebVTT text format and includes a few
> parser descriptions, such as the one for timestamps.
>
>  * 3.2 "Parsing" describes the top-level parsing algorithm for WebVTT
> text files.
>
>  * 3.3 "Cue text parsing" describes a parsing algorithm for the contents
> of each cue, building a particular dom-ish data structure.
>
> Is one of these sections the "syntax specification" and another the
> "parser specification"? If so, where do they disagree? Can you give a
> specific example where one is more permissive or restrictive than another?
>
> The point of having a specific parser algorithm documented in the spec
> is to achieve uniform implementation. If everyone implements the
> algorithm (or its equivalent) the handling of edge cases will be more
> consistent than if everyone implements a parser from scratch based on
> some incomplete syntax description. So we should be implementing the
> parser the spec describes. If the spec is internally inconsistent we
> should file spec bugs and get the text fixed.
>
> Nevertheless, the current code doesn't pass--or even run to completion
> on--a number of the current tests, so it's difficult to tell what works
> and what doesn't. I think fixing that should be the highest priority for
> those working on the parser. Without tests we don't know where we stand, or
>
> Kyle, Caitlin's suggestion that you provide a separate set of parser
> tests seems reasonable to me if she wants the current set for code
> coverage or additional features. The test sets can always be merged
> later if there's consensus that's appropriate. In the meantime you won't
> get in each other's way.
>
> - Ralph
> _______________________________________________
> dev-media mailing list
> [email protected]<mailto:[email protected]>
> https://lists.mozilla.org/listinfo/dev-media
>
_______________________________________________
dev-media mailing list
[email protected]<mailto:[email protected]>
https://lists.mozilla.org/listinfo/dev-media

_______________________________________________
dev-media mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-media

RE: WebVTT Parser Standards Compliance

Reply via email to