[
https://issues.apache.org/jira/browse/HTTPCLIENT-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153265#comment-17153265
]
Mark Mielke edited comment on HTTPCLIENT-1995 at 7/8/20, 6:24 AM:
------------------------------------------------------------------
This is how I saw things unfold:
# I believe a legitimate thing was trying to be addressed related to URL
normalization. This had to do with either "//" or "/./" or possible UTF-8
characters. I'm not sure which. Probably, it was a real problem that should be
fixed.
# The change was seen to be of minor impact, because the fix seemed
straight-forward, and the problem seemed legit. A well-intended fix was
implemented to normalize the URL based upon an interpretation of a standard,
possibly re-using the works of others which was expected to be stable and
standard.
# The change was added in a patch release, rather than waiting for a new minor
release, or new major release.
# The change was discovered to break user expectations related to the use of
reserved characters. The normalization method that was being newly applied in
this patch release, was changing the URL in such a way as to change the meaning
of the URL by the time it reached the server. This was reported in this issue.
# The initial response to this issue is that the server was clearly broken, as
the characters are the same whether encoded or not, and that Apache HttpClient
was in the right, correctly normalizing the characters.
# Various quoting of the specifications makes it clear (to some of us, anyways)
that the interpretation is wrong. Reserved characters need the ability to
%-encode them as literals, so as to bypass the URL interpretations, and it is a
requirement that these are passed through to the server application to decide
what to do with them.
# Various further quoting of the specifications and history tries to muddy this
up by claiming that the newest specification doesn't apply (\???), or that it
is ok to use the prior interpretation (\???).
# Downstream users are broken, and since the Apache HttpClient issue is not
being addressed, downstream users are finding around this *defect* by either
disabliing the normalization feature, or using an alternate implementation that
does not have this defect.
# It's one year later and while the breakage was urgently inserted, it has been
determined to be "Invalid" and left broken.
# I'm pointing out that this is a problem with community-based projects that
isn't unique to Apache HttpClient. It's terribly frustrating when it happens to
us. But, the opposite extreme of design by committee also isn't without
concerns.
As to what *should* you do?
# If something is broken, by all means fix it. But, fix it in such a way that
will not abandon the interest of your users. This means that behaviour changes
normally should be deferred to a next minor or major release and clearly
communicated, as well as discussed in advance.
# When a mistake is made, such as by believing that a change will have low or
no impact, but finding that it is actually quite impactful, the change should
be fixed or reverted just as quickly as it was inserted in the first place.
This is a responsibility for project owners.
# If there are issues with native English speakers and not, that's exactly why
communications with the broader community before making changes is so
important, rather than one or two people making a decision on their own,
without any outside input. "If I do this change, who will it break?"
was (Author: mark.mielke):
This is how I saw things unfold:
# I believe a legitimate thing was trying to be addressed related to URL
normalization. This had to do with either "//" or "/./" or possible UTF-8
characters. I'm not sure which. Probably, it was a real problem that should be
fixed.
# The change was seen to be of minor impact, because the fix seemed
straight-forward, and the problem seemed legit. A well-intended fix was
implemented to normalize the URL based upon an interpretation of a standard,
possibly re-using the works of others which was expected to be stable and
standard.
# The change was added in a patch release, rather than waiting for a new minor
release, or new major release.
# The change was discovered to break user expectations related to the use of
reserved characters. The normalization method that was being newly applied in
this patch release, was changing the URL in such a way as to change the meaning
of the URL by the time it reached the server. This was reported in this issue.
# The initial response to this issue is that the server was clearly broken, as
the characters are the same whether encoded or not, and that Apache HttpClient
was in the right, correctly normalizing the characters.
# Various quoting of the specifications makes it clear (to some of us, anyways)
that the interpretation is wrong. Reserved characters need the ability to
%-encode them as literals, so as to bypass the URL interpretations, and it is a
requirement that these are passed through to the server application to decide
what to do with them.
# Various further quoting of the specifications and history tries to muddy this
up by claiming that the newest specification doesn't apply (???), or that it is
ok to use the prior interpretation (???).
# Downstream users are broken, and since the Apache HttpClient issue is not
being addressed, downstream users are finding around this *defect* by either
disabliing the normalization feature, or using an alternate implementation that
does not have this defect.
# It's one year later and while the breakage was urgently inserted, it has been
determined to be "Invalid" and left broken.
# I'm pointing out that this is a problem with community-based projects that
isn't unique to Apache HttpClient. It's terribly frustrating when it happens to
us. But, the opposite extreme of design by committee also isn't without
concerns.
As to what *should* you do?
# If something is broken, by all means fix it. But, fix it in such a way that
will not abandon the interest of your users. This means that behaviour changes
normally should be deferred to a next minor or major release and clearly
communicated, as well as discussed in advance.
# When a mistake is made, such as by believing that a change will have low or
no impact, but finding that it is actually quite impactful, the change should
be fixed or reverted just as quickly as it was inserted in the first place.
This is a responsibility for project owners.
# If there are issues with native English speakers and not, that's exactly why
communications with the broader community before making changes is so
important, rather than one or two people making a decision on their own,
without any outside input. "If I do this change, who will it break?"
> Percent-encoded ampersand in URI path not preserved
> ---------------------------------------------------
>
> Key: HTTPCLIENT-1995
> URL: https://issues.apache.org/jira/browse/HTTPCLIENT-1995
> Project: HttpComponents HttpClient
> Issue Type: Bug
> Components: HttpClient (classic)
> Affects Versions: 4.5.8, 4.5.9
> Environment: Linux Mint 19, OpenJDK 8
> Reporter: none_
> Priority: Major
>
> Starting with HttpClient 4.5.8, percent-encoded ampersand characters in URI
> path segments are not preserved any longer but written in decoded form to
> wire due to path normalization performed by URIUtils.rewriteURI(URI,
> HttpHost).
>
> According to RFC 3986 (page 11+), the ampersand character is a delimiter and
> thus needs to be percent-encoded when not used for this purpose. Path
> normalization, as performed by HttpClient v4.5.8+, creates a new URI that is
> not equivalent to the original URI and thus leads to misinterpretation on
> server/receiver side.
> ??URIs that differ in the replacement of a reserved character with its??
> ??corresponding percent-encoded octet are not equivalent. Percent-??
> ??encoding a reserved character, or decoding a percent-encoded octet??
> ??that corresponds to a reserved character, will change how the URI is??
> ??interpreted by most applications??.
>
> A very simple test case is as follows:
> {code:java}
> @Test
> public void testAmpersand() throws Throwable
> {
> final URI uri = new
> URI("http://example.org/some/path%26with%20percent/encoded/segments");
> final URI uri2 = URIUtils.rewriteURI(uri, null);
>
> Assert.assertEquals(uri, uri2);
> }
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]