[
https://issues.apache.org/jira/browse/HTTPCLIENT-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756479#comment-16756479
]
Jay Modi edited comment on HTTPCLIENT-1968 at 1/30/19 7:35 PM:
---------------------------------------------------------------
My apologies; the document I was looking at was actually not RFC 2396 but an
updated and expired draft:
[https://tools.ietf.org/id/draft-fielding-uri-rfc2396bis-07.txt]
In [section 2.4.2|https://tools.ietf.org/html/rfc2396#section-2.4.2] of the
actual RFC 2396, the following is stated:
{quote}2.4.2. When to Escape and Unescape
A URI is always in an "escaped" form, since escaping or unescaping a completed
URI might change its semantics. Normally, the only time escape encodings can
safely be made is when the URI is being created from its component parts; each
component may have its own set of characters that are reserved, so only the
mechanism responsible for generating or interpreting that component can
determine whether or not escaping a character will change its semantics.
Likewise, a URI must be separated into its components before the escaped
characters within those components can be safely decoded.
{quote}
I understand that path normalization is reasonable, but path normalization
should not change the resource referenced which this does. Ultimately this is
still from a newer standard, but [RFC 3986 Section
6.2.2.2|https://tools.ietf.org/html/rfc3986#section-6.2.2.2] states:
{quote}6.2.2.2. Percent-Encoding Normalization
The percent-encoding mechanism (Section 2.1) is a frequent source of variance
among otherwise identical URIs. In addition to the case normalization issue
noted above, some URI producers percent-encode octets that do not require
percent-encoding, resulting in URIs that are equivalent to their non-encoded
counterparts. These URIs should be normalized by decoding any percent-encoded
octet that corresponds to an unreserved character, as described in Section 2.3.
{quote}
The key here is that a reserved character is being decoded, which changes the
meaning of the URI. RFC 2396 doesn't provide these type of normalization
standards but I do not see how decoding reserved characters that are encoded
and changing the meaning of a URI is the right behavior.
was (Author: jaymode):
My apologies; the document I was looking at was actually not RFC 2396 but an
[updated and expired
draft|[https://tools.ietf.org/id/draft-fielding-uri-rfc2396bis-07.txt]]
In [section 2.4.2|https://tools.ietf.org/html/rfc2396#section-2.4.2] of the
actual RFC 2396, the following is stated:
{quote}2.4.2. When to Escape and Unescape
A URI is always in an "escaped" form, since escaping or unescaping a completed
URI might change its semantics. Normally, the only time escape encodings can
safely be made is when the URI is being created from its component parts; each
component may have its own set of characters that are reserved, so only the
mechanism responsible for generating or interpreting that component can
determine whether or not escaping a character will change its semantics.
Likewise, a URI must be separated into its components before the escaped
characters within those components can be safely decoded.
{quote}
I understand that path normalization is reasonable, but path normalization
should not change the resource referenced which this does. Ultimately this is
still from a newer standard, but [RFC 3986 Section
6.2.2.2|https://tools.ietf.org/html/rfc3986#section-6.2.2.2] states:
{quote}6.2.2.2. Percent-Encoding Normalization
The percent-encoding mechanism (Section 2.1) is a frequent source of variance
among otherwise identical URIs. In addition to the case normalization issue
noted above, some URI producers percent-encode octets that do not require
percent-encoding, resulting in URIs that are equivalent to their non-encoded
counterparts. These URIs should be normalized by decoding any percent-encoded
octet that corresponds to an unreserved character, as described in Section 2.3.
{quote}
The key here is that a reserved character is being decoded, which changes the
meaning of the URI. RFC 2396 doesn't provide these type of normalization
standards but I do not see how decoding reserved characters that are encoded
and changing the meaning of a URI is the right behavior.
> Encoded forward slashes are not preserved when rewriting URI
> ------------------------------------------------------------
>
> Key: HTTPCLIENT-1968
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1968
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Affects Versions: 4.5.7
> Reporter: Jay Modi
> Priority: Major
> Attachments: rewrite_preserve_forward_slash.diff
>
>
> URIs that contain an encoded forward slash (%2F) are no longer preserved when
> the HTTP client executes. I came across this when upgrading from 4.5.2 to
> 4.5.7 and my requests that contained an encoded forward slash suddenly
> started failing. The appears to be due to decoding and re-encoding of the
> path that takes place in the URIUtils#rewriteURI method. I've attached a
> patch that restores the old behavior but if a URI contains two slashes in a
> row in addition to an encoded slash the encoded forward slash will be decoded.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]