[ 
https://issues.apache.org/jira/browse/HTTPCLIENT-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153265#comment-17153265
 ] 

Mark Mielke edited comment on HTTPCLIENT-1995 at 7/8/20, 6:28 AM:
------------------------------------------------------------------

This is how I saw things unfold:
 # I believe a legitimate thing was trying to be addressed related to URL 
normalization. This had to do with either "//" or "/./" or possible UTF-8 
characters. I'm not sure which. Probably, it was a real problem that should be 
fixed.
 # The change was seen to be of minor impact, because the fix seemed 
straight-forward, and the problem seemed legit. A well-intended fix was 
implemented to normalize the URL based upon an interpretation of a standard, 
possibly re-using the works of others which was expected to be stable and 
standard.
 # The change was added in a patch release, rather than waiting for a new minor 
release, or new major release.
 # The change was discovered to break user expectations related to the use of 
reserved characters. The normalization method that was being newly applied in 
this patch release, was changing the URL in such a way as to change the meaning 
of the URL by the time it reached the server. This was reported in this issue.
 # The initial response to this issue is that the server was clearly broken, as 
the characters are the same whether encoded or not, and that Apache HttpClient 
was in the right, correctly normalizing the characters.
 # Various quoting of the specifications makes it clear (to some of us, 
anyways) that the interpretation is wrong. Reserved characters need the ability 
to %-encode them as literals, so as to bypass the URL interpretations, and it 
is a requirement that these are passed through to the server application to 
decide what to do with them.
 # Various further quoting of the specifications and history tries to muddy 
this up by claiming that the newest specification doesn't apply (\?\?\?), or 
that it is ok to use the prior interpretation (\?\?\?).
 # Downstream users are broken, and since the Apache HttpClient issue is not 
being addressed, downstream users are finding ways to work around this *defect* 
by either disabliing the normalization feature, or using an alternate 
implementation that does not have this defect.
 # It's one year later and while the breakage was urgently inserted, it has 
been determined to be "Invalid" and left broken.
 # I'm pointing out that this is a problem with community-based projects that 
isn't unique to Apache HttpClient. It's terribly frustrating when it happens to 
us. But, the opposite extreme of design by committee also isn't without 
concerns.

As to what *should* you do?
 # If something is broken, by all means fix it. But, fix it in such a way that 
will not abandon the interest of your users. This means that behaviour changes 
normally should be deferred to a next minor or major release and clearly 
communicated, as well as discussed in advance.
 # When a mistake is made, such as by believing that a change will have low or 
no impact, but finding that it is actually quite impactful, the change should 
be fixed or reverted just as quickly as it was inserted in the first place. 
This is a responsibility for project owners.
 # If there are issues with native English speakers and not, that's exactly why 
communications with the broader community before making changes is so 
important, rather than one or two people making a decision on their own, 
without any outside input. "If I do this change, who will it break?"


was (Author: mark.mielke):
This is how I saw things unfold:
 # I believe a legitimate thing was trying to be addressed related to URL 
normalization. This had to do with either "//" or "/./" or possible UTF-8 
characters. I'm not sure which. Probably, it was a real problem that should be 
fixed.
 # The change was seen to be of minor impact, because the fix seemed 
straight-forward, and the problem seemed legit. A well-intended fix was 
implemented to normalize the URL based upon an interpretation of a standard, 
possibly re-using the works of others which was expected to be stable and 
standard.
 # The change was added in a patch release, rather than waiting for a new minor 
release, or new major release.
 # The change was discovered to break user expectations related to the use of 
reserved characters. The normalization method that was being newly applied in 
this patch release, was changing the URL in such a way as to change the meaning 
of the URL by the time it reached the server. This was reported in this issue.
 # The initial response to this issue is that the server was clearly broken, as 
the characters are the same whether encoded or not, and that Apache HttpClient 
was in the right, correctly normalizing the characters.
 # Various quoting of the specifications makes it clear (to some of us, 
anyways) that the interpretation is wrong. Reserved characters need the ability 
to %-encode them as literals, so as to bypass the URL interpretations, and it 
is a requirement that these are passed through to the server application to 
decide what to do with them.
 # Various further quoting of the specifications and history tries to muddy 
this up by claiming that the newest specification doesn't apply (???), or that 
it is ok to use the prior interpretation (???).
 # Downstream users are broken, and since the Apache HttpClient issue is not 
being addressed, downstream users are finding ways to work around this *defect* 
by either disabliing the normalization feature, or using an alternate 
implementation that does not have this defect.
 # It's one year later and while the breakage was urgently inserted, it has 
been determined to be "Invalid" and left broken.
 # I'm pointing out that this is a problem with community-based projects that 
isn't unique to Apache HttpClient. It's terribly frustrating when it happens to 
us. But, the opposite extreme of design by committee also isn't without 
concerns.

As to what *should* you do?
 # If something is broken, by all means fix it. But, fix it in such a way that 
will not abandon the interest of your users. This means that behaviour changes 
normally should be deferred to a next minor or major release and clearly 
communicated, as well as discussed in advance.
 # When a mistake is made, such as by believing that a change will have low or 
no impact, but finding that it is actually quite impactful, the change should 
be fixed or reverted just as quickly as it was inserted in the first place. 
This is a responsibility for project owners.
 # If there are issues with native English speakers and not, that's exactly why 
communications with the broader community before making changes is so 
important, rather than one or two people making a decision on their own, 
without any outside input. "If I do this change, who will it break?"

> Percent-encoded ampersand in URI path not preserved
> ---------------------------------------------------
>
>                 Key: HTTPCLIENT-1995
>                 URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1995
>             Project: HttpComponents HttpClient
>          Issue Type: Bug
>          Components: HttpClient (classic)
>    Affects Versions: 4.5.8, 4.5.9
>         Environment: Linux Mint 19, OpenJDK 8
>            Reporter: none_
>            Priority: Major
>
> Starting with HttpClient 4.5.8, percent-encoded ampersand characters in URI 
> path segments are not preserved any longer but written in decoded form to 
> wire due to path normalization performed by URIUtils.rewriteURI(URI, 
> HttpHost).
>  
> According to RFC 3986 (page 11+), the ampersand character is a delimiter and 
> thus needs to be percent-encoded when not used for this purpose. Path 
> normalization, as performed by HttpClient v4.5.8+, creates a new URI that is 
> not equivalent to the original URI and thus leads to misinterpretation on 
> server/receiver side.
> ??URIs that differ in the replacement of a reserved character with its??
> ??corresponding percent-encoded octet are not equivalent. Percent-??
> ??encoding a reserved character, or decoding a percent-encoded octet??
> ??that corresponds to a reserved character, will change how the URI is??
> ??interpreted by most applications??.
>   
> A very simple test case is as follows:
> {code:java}
> @Test
> public void testAmpersand() throws Throwable
> {
>     final URI uri = new 
> URI("http://example.org/some/path%26with%20percent/encoded/segments";);
>     final URI uri2 = URIUtils.rewriteURI(uri, null);
>         
>     Assert.assertEquals(uri, uri2);
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to