On Sun, 7 Nov 2004, [ISO-8859-15] André Malo wrote: > * Nick Kew <[EMAIL PROTECTED]> wrote: > > > BTW, the "what is a comment" problem is easier than it looks, as both > > <script> and <style> are declared in HTML as having CDATA content. > > That makes it trivial to distinguish them from "inert" comments. > > but in xhtml it's PCDATA, which makes them real xml comments...
So long as we're in web-browser-compatible land, we can parse XHTML with an HTML parser that knows about CDATA. And when we move out of it, we're also leaving commented <script> and <style> contents behind. I'll grant there are other pathological edge-cases due to the ways people abuse markup. That's one very good reason none of the modules I mentioned defaults to stripping comments:-) -- Nick Kew
