On 2011/04/11 23:18:13, xtof wrote:
If we're both in agreement on that, I think we should change the doc
of the
SafeUri contract to be a bit more specific about that.  I.e., instead
of

  18 /**
  19  * An object that implements this interface encapsulates a URI
that is
  20  * guaranteed to be safe to use (with respect to potential
Cross-Site-Scripting
  21  * vulnerabilities) in a URL context, for example in a URL-typed
attribute in
  22  * an HTML document.

more something along the lines:

/**
An object that implements this interface encapsulates a lexically
valid
(with respect to the set of valid characters specified RFC 3986) URI
that
when dereferenced, will not result in browser-side execution of script
that
is not under program control.

<p>
Note that this type's contract does not constrain the set of
characters that
may appear in the URI beyond the requirements of RFC 3986. If the
value of a
SafeUri object is used in certain contexts of a HTML document,
appropriate
escaping must be applied.

SGTM

> The CSS spec says whitespace, quotes (single and double) and
parentheses
> can be escaped using a backslash:
> http://www.w3.org/TR/CSS2/syndata.html#uri

I believe I've read somewhere that this doesn't work in IE, but I
can't find
the reference now...

One problem I'd worried about is this:  parentheses are legal in URIs
per
the RFC.  At the same time, \xxxxxx escaping them in the CSS seems to
not
actually be sufficient; such escaping doesn't prevent the string from
being
interpreted as CSS syntax (what were they thinking???), see
http://code.google.com/p/browsersec/wiki/Part1#CSS_character_encoding.

Note that the CSS spec talks about using '\(', not '\028'. I however
have absolutely no idea how well this is supported by browsers.

I think we'll just have to %-escape parentheses and single quotes when
using
a SafeUri in a CSS url(). I'm not really aware of them meaning
anything
special in *http* URIs; even if RFC 3986 says that %-escaping
parentheses
won't result in an equivalent URI I don't know of any reason doing so
would
result in a different resource being addressed on a web server.

+1

> So maybe we could special-case SafeUri in CSS contexts in the
generator?
>
I think we should just ban it (or at least warn), since we don't parse
the
CSS and can't really tell what context in the CSS the URI is used in.
A
safe style builder API would know, and would escape the URI correctly.

We already have the warning about "SafeUri outside URL-attribute
context", we sure could specialize the message a bit for the CSS
context.


I think you're right; sanitizeUri should %-escape characters not
allowed by
RFC 3986.  Looks like passing the string through

http://google-web-toolkit.googlecode.com/svn/javadoc/latest/com/google/gwt/http/client/URL.html#encode%28java.lang.String%29would
be sufficient? Anyway, I'll take a TODO item for this.

URL.encode wouldn't preserve existing %-escapes and would replace, e.g.
%28 with %2528. The algorithm should probably be similar to
htmlEscapeAllowEntities (i.e. split on %-escapes and encode the rest).
Also, URL.encode doesn't %-escape single quotes and parentheses.

Finally, URL.encode is "client side only", so an equivalent "pure Java"
algorithm should be written for "the JVM" (that would be less efficient
in the browser). Maybe let's start with a single, shared, algorithm
andadd the URL.encode-based one later (and benchmark!)

I'm still wondering if we should properly parse the whole URL and
insist
it's syntactically valid; when I wrote sanitizeUri I chose not to
because
there's no easily available way to do it in GWT (since java.net.URI
isn't
implemented), and it would seem to be fairly costly to do.

I believe %-escaping à la URL.encode would provide enough safety.

http://gwt-code-reviews.appspot.com/1380806/

--
http://groups.google.com/group/Google-Web-Toolkit-Contributors

Reply via email to