[ https://issues.apache.org/jira/browse/HTTPCLIENT-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758176#comment-16758176 ]
Clinton Gormley commented on HTTPCLIENT-1968: --------------------------------------------- RFC 2396 says: 2: {quote}Within a URI, characters are either used as delimiters, or to represent strings of data (octets) within the delimited portions. Octets are either represented directly by a character (using the US- ASCII character for that octet [ASCII]) or by an escape encoding.{quote} 2.2: {quote}Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI.{quote} 2.4.2: {quote}A URI is always in an "escaped" form, since escaping or unescaping a completed URI might change its semantics. Normally, the only time escape encodings can safely be made is when the URI is being created from its component parts; each component may have its own set of characters that are reserved, so only the mechanism responsible for generating or interpreting that component can determine whether or not escaping a character will change its semantics. Likewise, a URI must be separated into its components before the escaped characters within those components can be safely decoded.{quote} Treating `/p1/%2Fp2` as equivalent to `/p1//p2` is incorrect. As explained in 2.4.2, you cannot unescape the path without first breaking it into its components (by splitting on `/`). Then you need to re-escape before reassembling the path (concat'ing with `/`). Imagine you had a path like this: `/page/\{id}`, and that you had a page whose id was `/foo`, you have to escape the `/` before forming the path, so you would end up with `/page/%2Ffoo`. The way you're doing normalisation changes the meaning of the original path, and so is disallowed by 2.4.2. > Encoded forward slashes are not preserved when rewriting URI > ------------------------------------------------------------ > > Key: HTTPCLIENT-1968 > URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1968 > Project: HttpComponents HttpClient > Issue Type: Bug > Affects Versions: 4.5.7 > Reporter: Jay Modi > Priority: Major > Attachments: rewrite_preserve_forward_slash.diff > > > URIs that contain an encoded forward slash (%2F) are no longer preserved when > the HTTP client executes. I came across this when upgrading from 4.5.2 to > 4.5.7 and my requests that contained an encoded forward slash suddenly > started failing. The appears to be due to decoding and re-encoding of the > path that takes place in the URIUtils#rewriteURI method. I've attached a > patch that restores the old behavior but if a URI contains two slashes in a > row in addition to an encoded slash the encoded forward slash will be decoded. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org For additional commands, e-mail: dev-h...@hc.apache.org