On Wed, May 05, 2010 at 07:49:31PM +0200, Aristotle Pagaltzis wrote:
> * Louis-David Mitterrand <[email protected]> 
> [2010-05-05 16:05]:
> > What would be a "reasonable defaults" whitelist for html tags
> > in a forum context?
> 
> All the tags Markdown has syntax for:
> 
>     em strong a img code br
>     p ul ol li blockquote pre h1 h2 h3 h4 h5 h6
> 
> Plus a few very reasonable extras:
> 
>     i b cite del ins
>     dl dd dt
> 
> Attributes that should be allowed:
> 
>     a: href title
>     img: src alt title
>     ol: start
>     blockquote: cite
> 
> That's the minimal reasonable set, I think.
> 
> You may or may not want to also whitelist the table-related tags:
> 
>     table tr td th
>     tbody tfoot thead caption
> 
> Most of their possible attributes should be allowed in that case.
> 
> For those, you'll need to tidy the HTML, not just scrub it, else
> people will be able to break your layout in malicious ways.
> 
> You ***DON'T*** want to whitelist the `style` attribute under any
> circumstances, unless you also have a very very very careful CSS
> scrubber, because otherwise it's possible to inject Javascript
> that way.
> 
> You'll also want to validate `...@href` values to keep people from
> putting `javascript:` URIs or similar foolishness in there. If in
> doubt, allow too little.
> 

Thank you Aristotle for the detailed and informative answer. Very useful
indeed.

Fortunately HTML::Scrubber allows denying specific attributes based on a
regexp:

        'href' => qr{^(?!(?:java)?script)}i,
        'src'  => qr{^(?!(?:java)?script)}i,
        etc.


-- 
http://www.cruisefish.net
_______________________________________________
Markdown-Discuss mailing list
[email protected]
http://six.pairlist.net/mailman/listinfo/markdown-discuss

Reply via email to