On Thu, Jun 12, 2014 at 11:11 AM, Domenic Denicola <[email protected]> wrote: > I guess part of it is clarifying which part of "<script>'s insane parsing > rules" we're talking about. From what I'm aware of there are quite a lot of > different insanities; but I am fuzzy on the details. Does anyone know which > rules are inherently necessary, and which are historical accidents or > constraints?
I'll recap the rules for "script data state" from http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html#script-data-state As a general rule, `\r` and `\r\n` are converted to `\n`, and `\0` is not allowed. The case-insensitive sequence `</script` followed by a character in `[ \t\r\n\f/>]` terminates the script data section. (These constraints would be present for HTML-embedding.) In addition, the exact character sequence `<!--` switches to "escaped data" parsing. This is a bit hairy, and you can even end up in "double escaped" modes. See http://stackoverflow.com/questions/23727025/script-double-escaped-state for an example. Presumably these are the "insane parsing rules" under discussion. You are encouraged to try to follow the logic in the WHATWG spec yourself. ;) In addition, [Web EcmaScript](http://javascript.spec.whatwg.org/) introduces two new single line comment forms: `<!--` must be treated as if it were `//`, and `-->` (with some crazy start-of-line restrictions) is also treated as a single line comment. To some degree the line between the HTML parser and Web EcmaScript is movable; currently the HTML parser recognizes the `<!--` etc tokens but pushes them into the data section of the script tag anyway; one could just as easily imagine the HTML parser doing all the work and stripping the "new comment forms" from the token stream. --scott _______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

