[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17154151#comment-17154151
 ] 

Mark Mielke commented on HTTPCLIENT-1995:
-----------------------------------------

For:

{quote}
Characters in the "reserved" set are not reserved in all contexts.
The set of characters actually reserved within any given URI
component is defined by that component
{quote}

Oleg conveniently snipped the last sentence. The full paragraph reads: 

{quote}
Characters in the "reserved" set are not reserved in all contexts. The set of 
characters actually reserved within any given URI component is defined by that 
component. In general, a character is reserved if the semantics of the URI 
changes if the character is replaced with its escaped US-ASCII encoding.
{quote}

"In general, a character is reserved if the semantics of the URI changes if the 
character is replaced with its escaped US-ASCII encoding."

Perhaps this is a question of "semantics"... but it's pretty clear to most of 
us that de-coding `%26` to `&` in the case reported, is a definite change in 
semantics. Possibly Oleg is thinking "URI semantics" in the sense that 
"applications should not introduce their own semantics, they should only use 
those defined by the RFC", but this is entirely impractical. How do you safely 
encode a `&` in a path segment, and not have it interfere with `?query`, 
according to Oleg's understanding? The answer is that you cannot - which is 
what makes this interpretation so incredibly wrong.

> Percent-encoded ampersand in URI path not preserved
> ---------------------------------------------------
>
>                 Key: HTTPCLIENT-1995
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1995
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 4.5.8, 4.5.9
>         Environment: Linux Mint 19, OpenJDK 8
>            Reporter: none_
>            Priority: Major
>
> Starting with HttpClient 4.5.8, percent-encoded ampersand characters in URI 
> path segments are not preserved any longer but written in decoded form to 
> wire due to path normalization performed by URIUtils.rewriteURI(URI, 
> HttpHost).
>  
> According to RFC 3986 (page 11+), the ampersand character is a delimiter and 
> thus needs to be percent-encoded when not used for this purpose. Path 
> normalization, as performed by HttpClient v4.5.8+, creates a new URI that is 
> not equivalent to the original URI and thus leads to misinterpretation on 
> server/receiver side.
> ??URIs that differ in the replacement of a reserved character with its??
> ??corresponding percent-encoded octet are not equivalent. Percent-??
> ??encoding a reserved character, or decoding a percent-encoded octet??
> ??that corresponds to a reserved character, will change how the URI is??
> ??interpreted by most applications??.
>   
> A very simple test case is as follows:
> {code:java}
> @Test
> public void testAmpersand() throws Throwable
> {
>     final URI uri = new 
> URI("http://example.org/some/path%26with%20percent/encoded/segments";);
>     final URI uri2 = URIUtils.rewriteURI(uri, null);
>         
>     Assert.assertEquals(uri, uri2);
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to