On 2010/08/17 23:05:06, tbroyer wrote:
On 2010/08/17 20:19:04, xtof wrote: > On 2010/08/17 18:39:28, jat wrote: > > On 2010/08/16 23:39:47, tbroyer wrote: > > > The HtmlSanitizer is a good idea, but the implementation is very
weak [2].
> > > > Note that the API is what is important, and SimpleHtmlSanitizer is
just
that, > a > > simple implementation. A more involved implementation can be
added later.
> > > > Also, we aren't trying to parse HTML with a regex here, it simply
looks for
> > opening tags and allows unescaped on a small set of whitelisted
tags --
> > everything else gets escaped. If you think it fails to do its
job, can you
> > supply a string which would not be propertly sanitized?
Looking at the code more closely it would merely "fail" by overly
rejecting tags
that are whitelisted: i.e. "<b foo=<i>should be bold" would be
sanitized to
"<b foo=<i>should be bold" and the end part would be italicized
instead of
bold.
And that is exactly correct behavior for this class as document. It only claims to accept HTML with attribute-free tags within the whitelist. It doesn't claim to do anything particularly sensible with input that doesn't fit this constraint; it does however claim that whatever it outputs is safe (will not result in XSS/script execution).
But there's more:
> In other words, does SimpleHtmlSanitizer adhere to the SafeHtml type
contract?
> I believe it does, but of course would like arguments/hunches
towards the
> contrary.
SafeHtml calls for XSS mitigation, and SimpleHtmlSanitizer doesn't
sanitize most
XSS attacks, that use attributes: "<b
style='position:absolute;z-index:2147483646;left:0px;top:0px;right:0px;bottom:0px'
onmousemove=\"alert("you've been highjacked");\">"
That's not true, this would be turned into <b style...., which is safe (though ugly, but again see the above).
> Having said that, I kind of agree that SimpleHtmlSanitizer is of
pretty
limited > use; one of the few scenarios where I can see it used is for
rendering
> messages/snippets that are formatted with limited HTML markup, and
are
obtained > from a complex backend that the developer of the GWT client doesn't
quite want
> to trust to uphold the SafeHtml type contract. I wouldn't be
terribly opposed
> to droppping it. > > I didn't want to implement a full parser, because parsing full HTML
(say in an
> email app) inside the browser generally seems like a bad approach.
I agree, and I'd be OK with removing HtmlSanitizer from this patch to
better
reintroduce it later.
http://gwt-code-reviews.appspot.com/771801/show -- http://groups.google.com/group/Google-Web-Toolkit-Contributors
