Alex Kapranoff <[EMAIL PROTECTED]> writes: > As far as I can understand HTML::Parser simply ignores closing > </plaintext> tag. I read the tests and Changes so I see that this is > intended behaviour and <plaintext> is special-cased of all CDATA > elements. > > Does someone know the reasoning of this decision? :) It is just plain > interesting.
A long time ago the HTTP protocol did not have MIME-like headers. The client sent a "GET foo" line and the server responded with HTML and then closed the connection. Since there was no way for the server to indicate any other Content-Type than text/html the <plaintext> tag was introduced so that text files could be served by just prefixing the file content with this tag. This was before the <img> tag was invented so luckily we don't have a similar unclosed <gif> tag :) > Does HTML::Parser imitate some old browser here? Yes, it was there in the beginning but still seems well supported. Of my current browsers both Konqueror and MSIE support this. Firefox support it in the same way as <xmp>, i.e. it allow you to escape out of it with </plaintext>. The <plaintext> tag is described in this historic document: http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html#7 > It results in weird effects for me as I write a HTML sanitizer for > WebMail. Howcome? Do you have a need to suppress this behaviour in HTML::Parser? Regards, Gisle