On 26 June 2012 15:45, Oleg Kalnichevski <[email protected]> wrote: > On Tue, 2012-06-26 at 15:21 +0100, sebb wrote: >> On 26 June 2012 14:37, Oleg Kalnichevski <[email protected]> wrote: >> > On Tue, 2012-06-26 at 14:21 +0100, sebb wrote: >> >> On 26 June 2012 13:33, Oleg Kalnichevski <[email protected]> wrote: >> >> > On Tue, 2012-06-26 at 11:41 +0100, sebb wrote: >> >> >> On 26 June 2012 08:46, Oleg Kalnichevski <[email protected]> wrote: >> >> >> > On Tue, 2012-06-26 at 02:00 +0100, sebb wrote: >> >> >> >> The escaping of non-alphabetic characters by the format methods is >> >> >> >> no >> >> >> >> longer quite the same as that done by java.net.URLEncoder.encode. >> >> >> >> >> >> >> >> The former allows the chars in ".-*_!'()" to pass through without >> >> >> >> conversion, whereas the latter only allows ".-*_" unchanged. >> >> >> >> The latter is also how browsers behave when escaping form fields. >> >> >> >> >> >> >> >> I think the behaviour should be consistent with URLEncoder and >> >> >> >> browsers. >> >> >> >> That was in fact the behaviour with 4.2, which delegated the >> >> >> >> escaping >> >> >> >> to URLEncoder. >> >> >> >> I think the code should revert to using URLEncoder/URLDecoder. >> >> >> >> >> >> >> >> There is still a need for the extended path, query and fragment >> >> >> >> escape/unescape methods, but perhaps these belong in URIBuilder? >> >> >> >> If not, maybe they should be in a separate class anyway? >> >> >> >> >> >> >> > >> >> >> > Would not that lead to inconsistent behavior when the same query form >> >> >> > gets encoded differently depending on whether it is enclosed in the >> >> >> > request URI or in the request body? >> >> >> >> >> >> I don't think so, I think encodeFormFields could use a different safe >> >> >> character set without problems, so long as the safe set is a subset of >> >> >> all possible safe query characters. In fact the UNRESERVED BitSet is >> >> >> only currently used in URLEncodedUtils#encodeFormFields(), so I don't >> >> >> see how changing encodeFormFields to use a different safe set can >> >> >> affect anything. >> >> >> >> >> >> Besides, AFAIK 4.2 did not have a problem with using a more limited >> >> >> safe set. >> >> >> >> >> >> > Browsers do a lot of silly stuff to maximize compatibility with all >> >> >> > sorts of broken software out there. I am not sure we need to do >> >> >> > likewise. >> >> >> >> >> >> Well-written software will be able to deal with form data that has >> >> >> some additional safe characters encoded, so I don't think there is any >> >> >> problem in playing safe here. >> >> >> >> >> >> [But if we do decide to change the safe list from the one previously >> >> >> used, it needs to be flagged up in the release notes.] >> >> >> >> >> > >> >> > Likewise well-written software should be able to deal with the form data >> >> > containing valid URL encoded content. To me this is more about doing the >> >> > right thing rather than making sure some broken code is unaffected. >> >> >> >> http://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1 >> >> says that reserved chars are to be encoded as per RFC 1738 section 2.2. >> >> >> >> This implies that the safe set of chars is "$-_.+!*'()," plus "=" as >> >> it is reserved for the delimiter >> >> 4.2.1 doesn't currently allow "$", so arguably is not "doing the right >> >> thing" anyway. >> >> >> > >> > RFC 1738 was superseded by RFC 2396 (which is what java.net.URI is based >> > on and this is what we ought to use as a basis as well). RFC 2396 >> > clearly states "$" is one of the reserved characters. >> > >> > --- >> > 2.2. Reserved Characters >> > >> > Many URI include components consisting of or delimited by, certain >> > special characters. These characters are called "reserved", since >> > their usage within the URI component is limited to their reserved >> > purpose. If the data for a URI component would conflict with the >> > reserved purpose, then the conflicting data must be escaped before >> > forming the URI. >> > >> > reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | >> > "$" | "," >> >> But AFAIK "$" is not reserved within form data (or a general query), >> so does not need to be escaped. >> Also "~" is not reserved, but is escaped by browsers and 4.2 and 4.2.1. >> > > Are you sure about 4.2.1? As far as I can tell it should not as it is > clearly included in the UNRESERVED set.
My bad, "~" is treated as safe by 4.2.1. >> More fun: RFC 2396 is superseded by RFC 3986. >> The lists of allowable characters for path and query have not changed, >> but the reserved list is now larger. >> The only unreserved characters are now ".-_~", i.e. "!'()*" are now >> reserved (as are "#[]") ... >> > > I am aware of RFC 2396 having been superseded by RFC 3986. However as > long as we target Java 1.5 as the minimal runtime level, we should stick > to the same compliance level as the java.net.URI, which is RFC 2396 for > Java 1.5. OK. BTW Java 1.6 URI still references 2396. > Oleg > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
