Re: [whatwg] Comment Syntax and Parsing
On Tue, 24 Jan 2006, Lachlan Hunt wrote: As for how to parse it, I'll use these test cases to demonstrate what I consider to be the most sane way to handle comments. (Assume EOF at the end of each one) Test Case | Comment Content | Output ---|--|-- PA!SS|| PASS PA! -SS | - | PASS PA! --SS || PASS PA!-SS | - | PASS PA!- -SS | - -| PASS PA!- -SS -- | - -| PASS -- Agreed. PA!- !--SS -- | - ! | PASS -- Comment should be - !-- IMHO. It's still a bogus comment (in HTML5 nomenclature), the -- part is irrelevant. PA!- !-- -SS --| - !-- - | PASS -- Agreed. PA!- --SS| - | PASS PA!- -- SS | - | PASS These are bogus comments, so again, they should be - -- and - -- respectively, IMHO. PA!-- FAIL --SS | FAIL | PASS PA!-- FAIL --SS | FAIL | PASS PA!-- FAIL !-- --SS| FAIL !--| PASS PA!-- FAIL !-- -- --SS | FAIL !-- -- | PASS Agreed. PA!-- FAIL -- SS | FAIL| PASS Disagree. The terminator should be --, not -- S* . I don't see any good reason to have -- S* . P!-- -- AS!-- --S | (2 comments) | PASS Disagree (same reason). -- AS!-- is the comment, output is PS. PA!-- FAIL -- FAIL --SS | FAIL -- FAIL | PASS P!-- -- --AS!-- -- --S | -- (2 comments) | PASS PA!-- -- -- --SS | -- -- | PASS PA!-- FAIL -- FAIL -- FAIL --SS | FAIL -- FAIL -- FAIL | PASS PA!--- FAIL --SS | - FAIL | PASS PA!--- FAIL ---SS| - FAIL - | PASS !-- -FAIL| -FAIL| !--- -FAIL | - -FAIL | PA!-SS | - | PASS Agreed. !-- --- -| (not sure) | Comment text is --- -. PA!-- --- --SS | --- | PASS PA!--- --- ---SS | - --- -| PASS Agreed. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Comment Syntax and Parsing
Ian Hickson wrote: On Tue, 24 Jan 2006, Lachlan Hunt wrote: PA!- !--SS -- | - ! | PASS -- Comment should be - !-- IMHO. It's still a bogus comment (in HTML5 nomenclature), the -- part is irrelevant. Ok, so if a comment only starts with '!' then it ends at the first '' only (ignoring any '--'), but if a comment starts with '!--' then it must end with '--'. PA!- --SS| - | PASS PA!- -- SS | - | PASS These are bogus comments, so again, they should be - -- and - -- respectively, IMHO. Ok. PA!-- FAIL -- SS | FAIL| PASS Disagree. The terminator should be --, not -- S* . I don't see any good reason to have -- S* . I was working on the assumption that the comment would end at the first occurance of '' while in the comment end state, but that whitespace would be ignored while searching for it. Several browsers already handle it like that including Mozilla, Opera and Safari (except in Opera, the comment contained FAIL -). Although IE, OmniWeb and iCab failed. -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] Comment Syntax and Parsing
On Wed, 25 Jan 2006, Lachlan Hunt wrote: Ian Hickson wrote: On Tue, 24 Jan 2006, Lachlan Hunt wrote: PA!- !--SS -- | - ! | PASS -- Comment should be - !-- IMHO. It's still a bogus comment (in HTML5 nomenclature), the -- part is irrelevant. Ok, so if a comment only starts with '!' then it ends at the first '' only (ignoring any '--'), but if a comment starts with '!--' then it must end with '--'. Right. They end up in different parse states (bogus comment or bogus tag or something, vs comment or something). This is for compatibility with existing UAs -- basically it's not a comment really, just a malformed tag that happens to be turned into a Comment node in the DOM. PA!-- FAIL -- SS | FAIL| PASS Disagree. The terminator should be --, not -- S* . I don't see any good reason to have -- S* . I was working on the assumption that the comment would end at the first occurance of '' while in the comment end state, but that whitespace would be ignored while searching for it. Several browsers already handle it like that including Mozilla, Opera and Safari (except in Opera, the comment contained FAIL -). Although IE, OmniWeb and iCab failed. Really? In my testing, browsers didn't reliably do this. Were you testing standards mode or quirks mode? Did you have the potential to be hitting unexpected-EOF-reparse behaviour, or was it definitely the first-parse behaviour? -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Comment Syntax and Parsing
Ian Hickson wrote: On Wed, 25 Jan 2006, Lachlan Hunt wrote: Ian Hickson wrote: On Tue, 24 Jan 2006, Lachlan Hunt wrote: PA!-- FAIL -- SS | FAIL| PASS Disagree. The terminator should be --, not -- S* . I don't see any good reason to have -- S* . I was working on the assumption that the comment would end at the first occurance of '' while in the comment end state, but that whitespace would be ignored while searching for it. Several browsers already handle it like that including Mozilla, Opera and Safari (except in Opera, the comment contained FAIL -). Although IE, OmniWeb and iCab failed. Really? In my testing, browsers didn't reliably do this. Were you testing standards mode or quirks mode? Did you have the potential to be hitting unexpected-EOF-reparse behaviour, or was it definitely the first-parse behaviour? I tested the following in the live dom viewer using Firefox 1.5.0.1 Win and Mac, Opera 8.5/Mac, Opera 9 Win and Mac, Safari 2.0.3, IE6, OmniWeb 5.1.2 and iCab 3.0.1. !DOCTYPE html PA!-- FAIL -- SS Browser | Comment | Rendered --|-|--- Firefox | FAIL | PASS O 8.5/Mac | FAIL - | PASS O 9.0/Mac | FAIL | PASS O 9.0/Win | FAIL | PASS Safari| (not shown) | PASS IE6 | (not shown) | PA FAIL -- SS iCab | (not shown) | PA FAIL -- SS OmniWeb | (not shown) | PA FAIL -- SS (The live dom viewer didn't work for OmniWeb, I just used an HTML file instead) -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] Comment Syntax and Parsing
Ian Hickson wrote: On Wed, 25 Jan 2006, Lachlan Hunt wrote: I tested the following in the live dom viewer using Firefox 1.5.0.1 Win and Mac, Opera 8.5/Mac, Opera 9 Win and Mac, Safari 2.0.3, IE6, OmniWeb 5.1.2 and iCab 3.0.1. !DOCTYPE html PA!-- FAIL -- SS This triggers SGML comment parsing mode (which you don't want to be testing) in a number of browsers. Why? The closer we can define the behaviour to be compatible with existing standards mode behaviours, the better it will be for backwards compatibility? -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] Comment Syntax and Parsing
On Wed, 25 Jan 2006, Lachlan Hunt wrote: Ian Hickson wrote: On Wed, 25 Jan 2006, Lachlan Hunt wrote: I tested the following in the live dom viewer using Firefox 1.5.0.1 Win and Mac, Opera 8.5/Mac, Opera 9 Win and Mac, Safari 2.0.3, IE6, OmniWeb 5.1.2 and iCab 3.0.1. !DOCTYPE html PA!-- FAIL -- SS This triggers SGML comment parsing mode (which you don't want to be testing) in a number of browsers. Why? The closer we can define the behaviour to be compatible with existing standards mode behaviours, the better it will be for backwards compatibility? This entire discussion started from the developers of all the browsers who implemented the SGML comment mode coming to me and telling me I was stupid for even suggesting that this is how comments should be parsed. The whole point of all this is to simplify comment parsing. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Comment Syntax and Parsing
Quoting Lachlan Hunt [EMAIL PROTECTED]: This entire discussion started from the developers of all the browsers who implemented the SGML comment mode coming to me and telling me I was stupid for even suggesting that this is how comments should be parsed. The whole point of all this is to simplify comment parsing. Yes, and I agree with that. But, besides Mozilla, which of those browser versions that I tested actually have SGML comments enabled? Opera 9 I assume. If I remember correctly the SGML thing was fixed before the preview. We currently plan on going back to normal comment handling for the moment. So you could use Opera 8.5 if you do not want SGML comment handling. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Comment Syntax and Parsing
Also sprach Ian Hickson: This triggers SGML comment parsing mode (which you don't want to be testing) in a number of browsers. Why? The closer we can define the behaviour to be compatible with existing standards mode behaviours, the better it will be for backwards compatibility? This entire discussion started from the developers of all the browsers who implemented the SGML comment mode coming to me and telling me I was stupid for even suggesting that this is how comments should be parsed. The whole point of all this is to simplify comment parsing. Right. And since I run out of memory trying to parse a sentence with the word simple and SGML in it... Oops. Core dumped. -hkon
Re: [whatwg] Comment Syntax and Parsing
On Jan 23, 2006, at 05:23, Ian Hickson wrote: Probably the same as XML. Or maybe just !-- followed by zero or more characters other than U+, followed by --. Of those two choices, I prefer the former. I don't like the idea of expanding the set of conforming comments, because I think having conforming comments should maximize the backwards-compatibility of the comments (and there are browsers in the wild that implement SGML- style comments, which is incompatible with the latter alternative above). I think allowing paired double hyphens with whitespace in between and allowing whitespace between the ending -- and would make sense. This would improve the source-level upgradeability of valid HTML 4 to conforming HTML 5. However, it would have the old confusion issues. !-- I think this should be conforming. -- !-- Making -- -- this conforming would make sense as well. -- !-- IMO, this -- should not be conforming but should parse unambiguously with an easy parse error. -- -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Comment Syntax and Parsing
Quoting Henri Sivonen [EMAIL PROTECTED]: I think allowing paired double hyphens with whitespace in between and allowing whitespace between the ending -- and would make sense. This would improve the source-level upgradeability of valid HTML 4 to conforming HTML 5. However, it would have the old confusion issues. !-- I think this should be conforming. -- !-- Making -- -- this conforming would make sense as well. -- !-- IMO, this -- should not be conforming but should parse unambiguously with an easy parse error. -- And then it would be necessary to make this one non-conforming: !-- In comment -- -- Not in HTML 5 comment but in SGML comment -- I guess the XML style is the simplest thing that could work. :-/ You are talking about conformance, but what do you want the parser to do? And also there is talk about whitespace between -- and but currently all kinds of chracters are allowed there (including - for instance). -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] Comment Syntax and Parsing
On Mon, 23 Jan 2006, Lachlan Hunt wrote: Well that depends on the implementation and how SGML defines that such erroneous comments be handled. Indeed, there is that too. Whatever behaviour we require will be, to some extent, new behaviour. (Without a copy of IS0O-8879 handy, it's difficult to check, so the following is based purely on observing the implementations.) ISO 8879:1986 (including its 1996 and 1998 annexes) doesn't cover, as far as I can tell, error handling requirements for parsers. Do you know if browsers will be using this for both standards and quirks mode or will they retain their existing quirks mode parsing and use this as the new standards mode parsing only? I imagine that any changes to quirks mode handling will be done very carefully over an extended period of time. Well, many authors believe their using XHTML, and many even believe they using the correct XHTML MIME Type (using meta), even though they're not. So, regardless of whether they actually are or not, they're going to believe they are and it's best not to confuse them more by saying: ! isn't well-formed XML Fair enough. I've made it a parse error (which is what determines what conformance checkers must say regarding valid vs invalid syntax). ...have them come back and say: the validator says it's fine and then tell them: that's because the document isn't XHTML. only to hear: Yes it is, look at the meta element and all these slashes (br/) br/ will also be flagged as a parse error, for what it's worth. On Mon, 23 Jan 2006, Henri Sivonen wrote: [...] By the way, Henri, thanks for your comments a few months back about parsing. I've been using them, and have agreed and implemented most of them in the spec so far. I'll reply to them in more detail in due course. I think allowing paired double hyphens with whitespace in between [would make sense] That seems like excessive complexity for conformance checkers, with very little benefit (beyond the theoretical). and allowing whitespace between the ending -- and would make sense. This also seems a little gratuitous. This would improve the source-level upgradeability of valid HTML 4 to conforming HTML 5. However, it would have the old confusion issues. I think those issues outweigh the benefits you mention. I guess the XML style is the simplest thing that could work. :-/ I agree. :-) -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Comment Syntax and Parsing
On Jan 23, 2006, at 11:39, Anne van Kesteren wrote: I guess the XML style is the simplest thing that could work. :-/ You are talking about conformance, but what do you want the parser to do? I talked about conformance, because I'd prefer document conformance be defined in such a way that conforming comments maximize compatibility with different parsers. I did not say anything about how I want non-conforming comments to be handled, because I think Hixie has researched the issue so much more than I that I don't have anything educated enough to say right now. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] Comment Syntax and Parsing
On Mon, 23 Jan 2006, Lachlan Hunt wrote: Well, for what it's worth, I still don't think you were being stupid, I think you were right all along and had this been implemented by more than just Mozilla 7 years ago, the result may have been different. Authors find the -- thing unbelievably confusing. Why does: !-- Hello -- World -- How does comment work? -- I don't know. -- Do you? -- ...work, but this: !-- Hello World -- How does comment work? -- I don't know. -- Do you? -- ...or this: !-- Hello -- World -- How does comment work? -- I don't know. Do you? -- ...not? Authors just don't get it. It makes more sense when you have draconian error handling, but HTML doesn't. [...] all of those vendors have unanimously voted against implementing proper comment handling in favour of quirks-mode-style parsing, there really isn't a choice in the matter. (What HTML5 says isn't really quirks mode comment parsing, it's even simpler.) Probably the same as XML. Or maybe just !-- followed by zero or more characters other than U+, followed by --. I vote for keeping it very similar to XML, it'll be easier for authors only having to learn and remember one comment syntax. Plus CSS's. Plus Javascript's. So three syntaxes, at least. ...and this is assuming they'll ever use XML. Yeah. The question is do we really want to confuse people by telling them that their comment is invalid when they write: !- Yes, for backwards compatibility reasons. Fair enough. We can always allow it later. Another question is, do we wish to continue allowing white space like this: !-- comment -- I believe it's supported by all browsers without any difficulty Actually, it isn't. In most browsers that I tested the above gets treated as an unclosed comment which is then re-parsed in close at first mode. Since we're dropping the re-parse mode (see earlier mails), this goes away with it. You can test whether or not it's really supported by comparing these: !-- -- -- EOF !-- -- -- EOF !-- -- EOF !-- -- EOF ...in my script: http://software.hixie.ch/utilities/js/live-dom-viewer/ -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] Comment Syntax and Parsing
Ian Hickson wrote: On Mon, 23 Jan 2006, Lachlan Hunt wrote: Well, for what it's worth, I still don't think you were being stupid, I think you were right all along and had this been implemented by more than just Mozilla 7 years ago, the result may have been different. Authors find the -- thing unbelievably confusing. Oh, yes, absolutely. I know, I've tried explaining it to some with varying degrees of success. Why does: !-- Hello -- World -- How does comment work? -- I don't know. -- Do you? -- ...work, Well that depends on the implementation and how SGML defines that such erroneous comments be handled. (Without a copy of IS0O-8879 handy, it's difficult to check, so the following is based purely on observing the implementations.) Mozilla will handle that entirely as a single comment, which is closed at the occurance of -- at the end. onsgmls, however, (which is more likely to be closer to the SGML spec) will encounter the 'W' in 'World', which is outside of the comment, treat it as an erroneous unclosed comment declaration and implicity close it. It will then drop the 'W' completely and continue on, treating comment as an unknown and unclosed element along the way (assuming an HTML doctype is used). So, basically, none of those examples actually work, they just appear to work in some implementations. (What HTML5 says isn't really quirks mode comment parsing, it's even simpler.) Ok, well then I don't have a clue how quirks mode parsing works, it's just too unpredictable. I'm glad this is going to be simpler. Do you know if browsers will be using this for both standards and quirks mode or will they retain their existing quirks mode parsing and use this as the new standards mode parsing only? Probably the same as XML. Or maybe just !-- followed by zero or more characters other than U+, followed by --. I vote for keeping it very similar to XML, it'll be easier for authors only having to learn and remember one comment syntax. Plus CSS's. Plus Javascript's. So three syntaxes, at least. Yes, but authors don't confuse CSS and JavaScript as being the same language as HTML as often as they confuse HTML and XHTML as being the same. ...and this is assuming they'll ever use XML. Well, many authors believe their using XHTML, and many even believe they using the correct XHTML MIME Type (using meta), even though they're not. So, regardless of whether they actually are or not, they're going to believe they are and it's best not to confuse them more by saying: ! isn't well-formed XML and have them come back and say: the validator says it's fine and then tell them: that's because the document isn't XHTML. only to hear: Yes it is, look at the meta element and all these slashes (br/) Another question is, do we wish to continue allowing white space like this: !-- comment -- I believe it's supported by all browsers without any difficulty Actually, it isn't. In most browsers that I tested the above gets treated as an unclosed comment which is then re-parsed in close at first mode. You're right, but IE was the only browser that I could find which (in standards mode) treated it like that. Since we're dropping the re-parse mode (see earlier mails), this goes away with it. OK. -- Lachlan Hunt http://lachy.id.au/