[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152532#comment-17152532
 ] 

Mark Mielke commented on HTTPCLIENT-1995:
-----------------------------------------

[~fielding]: Since it is a difference of opinion, and [~olegk] has determined 
that this is not a bug, the only real option is to either fork the project or 
stop using the normalization function by disabling it or working around it.

I've watched this issue with some concern, as it represents a problem. 
Meritocracy is very useful for open source projects, as it ensures that the 
freedom for things to get done, get done. But it eventually leans hard on the 
people who make the contributions deciding what is allowed to be contributed, 
and when the people with the controls disagree with the people without the 
controls, this is where we end up. This problem is not limited to Apache 
HttpClient. It happens in many open source projects. Unfortunately, the 
opposite is design by large committees, and this can lead to almost nothing 
getting done.

I was first surprised that there would even be a question that this was a bug. 
However, when RFC was mentioned as describing how this was correct behaviour, 
and not a bug, I re-read the specifications mentioned (some new to me since I 
first learned any of this stuff decades ago!), and I am still surprised that 
this could be understood to be "correct" behaviour.

The RFC that is quoted as authoritative says:

[https://tools.ietf.org/html/rfc3986#section-2.2]
   The purpose of reserved characters is to provide a set of delimiting
   characters that are distinguishable from other data within a URI.
   URIs that differ in the replacement of a reserved character with its
   corresponding percent-encoded octet are not equivalent.  Percent-
   encoding a reserved character, or decoding a percent-encoded octet
   that corresponds to a reserved character, will change how the URI is
   interpreted by most applications.  Thus, characters in the reserved
   set are protected from normalization and are therefore safe to be
   used by scheme-specific and producer-specific algorithms for
   delimiting data subcomponents within a URI.
It seems to specifically call out "Percent-encoding a reserved character, or 
decoding a percent-encoded octet that corresponds to a reserved character, will 
change how the URI is interpreted by most applications. Thus, characters in the 
reserved set are protected from normalization and are therefore safe to be used 
by scheme-specific and producer-specific algorithms for delimiting data 
subcomponents within a URI."

Language is important - but unfortunately, complex language can be open to 
interpretation or mis-interpretation, and the authors of the document are not 
always aware of the misunderstandings that might exist, that might need to be 
addressed (although it seems these particular authors *were* aware, and did 
*exactly* this).

>From a practical standpoint: What is the purpose of %-encoding reserved 
>characters if *not* to preserve them in this way? Who here is of the serious 
>belief that %-encoding reserved characters is just a different way of 
>presentation, and normalization should be free to decode them at any time, 
>because it is the application that is broken if the application presumes it 
>can safely encode a reserved character using %-encoding, and have it be 
>preserved from the client to the server, allowing the application to determine 
>what to make of it?

I think the correct action here would be to acknowledge that there is a bug 
here, and that it should be fixed, although "fixing" it might require effort, 
so it might not be possible to do immediately.

In the mean-time, anybody who requires "Percent-encoding a reserved character, 
or decoding a percent-encoded octet that corresponds to a reserved character, 
will change how the URI is interpreted by most applications. Thus, characters 
in the reserved set are protected from normalization and are therefore safe to 
be used by scheme-specific and producer-specific algorithms for delimiting data 
subcomponents within a URI." should avoid using Apache HttpClient-based 
functions which perform normalization. There are other libraries that don't 
have this defect.

> Percent-encoded ampersand in URI path not preserved
> ---------------------------------------------------
>
>                 Key: HTTPCLIENT-1995
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1995
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 4.5.8, 4.5.9
>         Environment: Linux Mint 19, OpenJDK 8
>            Reporter: none_
>            Priority: Major
>
> Starting with HttpClient 4.5.8, percent-encoded ampersand characters in URI 
> path segments are not preserved any longer but written in decoded form to 
> wire due to path normalization performed by URIUtils.rewriteURI(URI, 
> HttpHost).
>  
> According to RFC 3986 (page 11+), the ampersand character is a delimiter and 
> thus needs to be percent-encoded when not used for this purpose. Path 
> normalization, as performed by HttpClient v4.5.8+, creates a new URI that is 
> not equivalent to the original URI and thus leads to misinterpretation on 
> server/receiver side.
> ??URIs that differ in the replacement of a reserved character with its??
> ??corresponding percent-encoded octet are not equivalent. Percent-??
> ??encoding a reserved character, or decoding a percent-encoded octet??
> ??that corresponds to a reserved character, will change how the URI is??
> ??interpreted by most applications??.
>   
> A very simple test case is as follows:
> {code:java}
> @Test
> public void testAmpersand() throws Throwable
> {
>     final URI uri = new 
> URI("http://example.org/some/path%26with%20percent/encoded/segments";);
>     final URI uri2 = URIUtils.rewriteURI(uri, null);
>         
>     Assert.assertEquals(uri, uri2);
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to