[
https://issues.apache.org/jira/browse/HTTPCLIENT-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758176#comment-16758176
]
Clinton Gormley commented on HTTPCLIENT-1968:
---------------------------------------------
RFC 2396 says:
2:
{quote}Within a URI, characters are either used as delimiters, or to represent
strings of data (octets) within the delimited portions. Octets are either
represented directly by a character (using the US- ASCII character for that
octet [ASCII]) or by an escape encoding.{quote}
2.2:
{quote}Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since their usage
within the URI component is limited to their reserved purpose. If the data for
a URI component would conflict with the reserved purpose, then the conflicting
data must be escaped before forming the URI.{quote}
2.4.2:
{quote}A URI is always in an "escaped" form, since escaping or unescaping a
completed URI might change its semantics. Normally, the only time escape
encodings can safely be made is when the URI is being created from its
component parts; each component may have its own set of characters that are
reserved, so only the mechanism responsible for generating or interpreting that
component can determine whether or not escaping a character will change its
semantics. Likewise, a URI must be separated into its components before the
escaped characters within those components can be safely decoded.{quote}
Treating `/p1/%2Fp2` as equivalent to `/p1//p2` is incorrect. As explained in
2.4.2, you cannot unescape the path without first breaking it into its
components (by splitting on `/`). Then you need to re-escape before
reassembling the path (concat'ing with `/`).
Imagine you had a path like this: `/page/\{id}`, and that you had a page whose
id was `/foo`, you have to escape the `/` before forming the path, so you would
end up with `/page/%2Ffoo`. The way you're doing normalisation changes the
meaning of the original path, and so is disallowed by 2.4.2.
> Encoded forward slashes are not preserved when rewriting URI
> ------------------------------------------------------------
>
> Key: HTTPCLIENT-1968
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1968
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Affects Versions: 4.5.7
> Reporter: Jay Modi
> Priority: Major
> Attachments: rewrite_preserve_forward_slash.diff
>
>
> URIs that contain an encoded forward slash (%2F) are no longer preserved when
> the HTTP client executes. I came across this when upgrading from 4.5.2 to
> 4.5.7 and my requests that contained an encoded forward slash suddenly
> started failing. The appears to be due to decoding and re-encoding of the
> path that takes place in the URIUtils#rewriteURI method. I've attached a
> patch that restores the old behavior but if a URI contains two slashes in a
> row in addition to an encoded slash the encoded forward slash will be decoded.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]