Re: type=HTML

Sam Ruby Tue, 08 Feb 2005 13:02:31 -0800


Julian Reschke wrote:

Sam Ruby wrote:
Julian Reschke wrote:
(<http://atompub.org/2005/01/27/draft-ietf-atompub-format-05.html#rfc.section.3.1.1>)
The spec currently says:
"If the value of "type" is "HTML", the content of the Text construct MUST NOT contain child elements, and SHOULD be suitable for handling by software that knows HTML. The HTML markup must be escaped; for example, "<br>" as "<br>". The HTML markup SHOULD be such that it could validly appear directly within an HTML <DIV> element. Receiving software which displays the content MAY use the markup to aid in displaying it."

Is there anything that we can say about what recipients should do if they are not prepared to tag-soup-parse HTML content (such as something based on XSLT1 in Mozilla or running in a size-constrained environment (does MIDP come with an HTML parser)? Skip the entry? Do not display the content? Display the content including the escaped markup as plain text?
I would suggest stripping the tags.  In Perl, something like this:
s/<.*?>//g
Thanks. Are we 100% confident that whatever results from that replacement can be safely embedded? For instance, what about <script> tags? Can they contain potentially dangerous code that would execute without being referenced from somewhere?

If one did the simplistic elimination of tags that I mention above, then scripts would show as content as opposed to being executed. Another place where the results would be grossly suboptimal would be <table>s.

But in other cases, the results would largely display as intended. Text would not be in bold/italics/whatever, super and subscripts would not be above or below, and lists would be flowed, but for most weblog posts the ideas would get through.

But in all cases, the results could be safely displayed as text.

- Sam Ruby

Re: type=HTML

Reply via email to