https://issues.apache.org/bugzilla/show_bug.cgi?id=35256
--- Comment #15 from Timothy Ace <[email protected]> 2010-12-28 15:33:41 EST --- My company has also run into several issues with AllowEncodedSlashes already. These issues mostly come up in cases where PATH_INFO is being used either in a resource name for a REST API or for an asset name for a video, document, news article, etc. that contains a slash in it's name. This makes us very invested in this issue. Quite honestly the current implementation is wrong and violates RFC. Check out Example 2 from the REDUCED OR INCREASED SAFE CHARACTER SETS section of RFC 1630: Example 2 The URIs http://info.cern.ch/albert/bertram/marie-claude and http://info.cern.ch/albert/bertram%2Fmarie-claude are NOT identical, as in the second case the encoded slash does not have hierarchical significance. Tim specifically called out this example in RFC 1630 and it is of great importance to us for two reasons: 1. It shows concretely that having a %2F in the URL is valid. By having the default behavior of httpd to reject this request with a 404 error makes it non RFC 1630 compliant out-of-box. 2. Even it we turn on AllowEncodedSlashes, httpd interpolates the %2F as a path separator, violating RFC 1630 because it makes the two URLs in Example 2 above equivalent. ex. If "albert" is the name of the script or handler, then the PATH_INFO for both URLs will be "/bertram/marie-claude" -- which is indistinguishable from one one another, therefore making them identical. Of note is that RFC 1630 has not been updated by or obsoleted by any other RFC and is still the basis for URLs in WWW -- something core to httpd. While Section 2.4.2 of RFC 2396 (section 2.4 in RFC 3986 that obsoletes RFC 2396) mentions that a tilde (~) and a %7E can be used interchanably in a URL, it is not pertenient to this issue since a tilde is not a "reserved character" (specifically called out as an "unreserved character"), yet a slash (/) is reserved. >From Section 2.2 of RFC 3986: reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. PERCENT- ENCODING A RESERVED CHARACTER, OR DECODING A PERCENT-ENCODED OCTET THAT CORRESPONDS TO A RESERVED CHARACTER, WILL CHANGE HOW THE URI IS INTERPRETED BY MOST APPLICATIONS. THUS, CHARACTERS IN THE RESERVED SET ARE PROTECTED FROM NORMALIZATION AND ARE THEREFORE SAFE TO BE USED BY SCHEME-SPECIFIC AND PRODUCER-SPECIFIC ALGORITHMS FOR DELIMITING DATA SUBCOMPONENTS WITHIN A URI. I realize that it does say "most applications", however, it does go on in the next statement to say that "characters in the reserved set are protected from normalization". Therefore the correct solution here is to change httpd to NEVER decode any of the reserved characters from the ABNF. This would follow RFC 1630 & RFC 3986 and would also make the note in the documenation for the AllowEncodedSlashes directive (http://httpd.apache.org/docs/2.2/en/mod/core.html#allowencodedslashes) correct once again in that slashes will not be decoded. Two additional notes: 1. AllowEncodedSlashes should really be "on" by default and probably even deprecated. From what I can tell the only thing it protects against is poor application writers and does it in a less-than-graceful way by slapping up a 404. It also seems a very small percentage of people even know about the AllowEncodedSlashes and those that do end up turning it on because they found out about it because they spent a few hours scratching their head, modifying configurations and rewrite rules trying to figure out why a valid URL was being rejected. 2. Nowhere the RFCs is a backslash (\) listed as a reserved character. Therefore a %5C *should* always be decoded the same as %7E is converted to a tilde (~). -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
