Interesting.. Karl, does your certainty mean that you are saying
that the distinction between the two tags is fundamentally
unknowable for a parser?

I guess one good sign is that there appears to be a lot of
past literature on this issue, on Tidy listservs.  Including
one from 2006 called "Tidy barfs on split <SCRIPT> tags".
Unless it's an impossible problem, maybe these past threads
will contain something we can use.  I will read some of this
correspondence.

This reminds me of other gnarly situations with literals.
For instance, when there are regular expression criteria in
javascript strings that contain just solely a close brace or close
parenthesis, if I come along and want to make
assumptions about pairs of braces, the unmatched literal gets me
out of sync.

Kevin


On Thu, 10 Sep 2015, Karl Dahlke wrote:

I'm fairly certain, and fairly concerned, that this is a tidy bug
that we can't get around.
Source as follows.

<body>
<script>document.write("<script></s");document.write("cript>")</script>
<p>paragraph</p>
</body>

db6
js
b

undoCompare no undo map
line 1 column 1: missing <!DOCTYPE> declaration
line 2 column 34: '<' + '/' + letter not allowed here
line 2 column 69: '<' + '/' + letter not allowed here
line 3 column 14: '<' + '/' + letter not allowed here
line 4 column 5: '<' + '/' + letter not allowed here
line 2 column 1: missing </script>
line 2 column 1: missing </script>
line 1 column 1: inserting missing 'title' element
Node(0): Root {
Node(1): DOCTYPE {
@PUBLIC = (null)
}
Node(1): html {
Node(2): head {
Node(3): meta {
@name = generator
@content = HTML Tidy for HTML5 for Linux/x86 version 5.1.2
}
Node(3): title {
}
}
Node(2): body {
Node(3): script {
Node(4): Text {
Text: document.write("<script><\/s");document.write("cript>")<\/script>
<p>paragraph<\/p>
<\/body>

}
}
}
}
}
||

So you see all the text is subsumed under the script tag.
And slashes are escaped.
Tidy doesn't grasp the </script> terminater.
Thoughts?

Karl Dahlke
_______________________________________________
Edbrowse-dev mailing list
[email protected]
http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev


--------
Kevin Carhart * 415 225 5306 * The Ten Ninety Nihilists
_______________________________________________
Edbrowse-dev mailing list
[email protected]
http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev

Reply via email to