[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758176#comment-16758176
 ] 

Clinton Gormley commented on HTTPCLIENT-1968:
---------------------------------------------

 

 

RFC 2396 says:

 

2:
{quote}Within a URI, characters are either used as delimiters, or to represent 
strings of data (octets) within the delimited portions. Octets are either 
represented directly by a character (using the US- ASCII character for that 
octet [ASCII]) or by an escape encoding.{quote}
2.2:
{quote}Many URI include components consisting of or delimited by, certain 
special characters. These characters are called "reserved", since their usage 
within the URI component is limited to their reserved purpose. If the data for 
a URI component would conflict with the reserved purpose, then the conflicting 
data must be escaped before forming the URI.{quote}
2.4.2:
{quote}A URI is always in an "escaped" form, since escaping or unescaping a 
completed URI might change its semantics. Normally, the only time escape 
encodings can safely be made is when the URI is being created from its 
component parts; each component may have its own set of characters that are 
reserved, so only the mechanism responsible for generating or interpreting that 
component can determine whether or not escaping a character will change its 
semantics. Likewise, a URI must be separated into its components before the 
escaped characters within those components can be safely decoded.{quote}
Treating `/p1/%2Fp2` as equivalent to `/p1//p2` is incorrect.  As explained in 
2.4.2, you cannot unescape the path without first breaking it into its 
components (by splitting on `/`).  Then you need to re-escape before 
reassembling the path (concat'ing with `/`).

Imagine you had a path like this: `/page/\{id}`, and that you had a page whose 
id was `/foo`, you have to escape the `/` before forming the path, so you would 
end up with `/page/%2Ffoo`.  The way you're doing normalisation changes the 
meaning of the original path, and so is disallowed by 2.4.2.

 

> Encoded forward slashes are not preserved when rewriting URI
> ------------------------------------------------------------
>
>                 Key: HTTPCLIENT-1968
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1968
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>    Affects Versions: 4.5.7
>            Reporter: Jay Modi
>            Priority: Major
>         Attachments: rewrite_preserve_forward_slash.diff
>
>
> URIs that contain an encoded forward slash (%2F) are no longer preserved when 
> the HTTP client executes. I came across this when upgrading from 4.5.2 to 
> 4.5.7 and my requests that contained an encoded forward slash suddenly 
> started failing. The appears to be due to decoding and re-encoding of the 
> path that takes place in the URIUtils#rewriteURI method. I've attached a 
> patch that restores the old behavior but if a URI contains two slashes in a 
> row in addition to an encoded slash the encoded forward slash will be decoded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to