carterkozak commented on a change in pull request #274:
URL:
https://github.com/apache/httpcomponents-core/pull/274#discussion_r602328865
##########
File path: httpcore5/src/main/java/org/apache/hc/core5/net/URIBuilder.java
##########
@@ -292,6 +292,8 @@ private String buildString() {
}
if (InetAddressUtils.isIPv6Address(this.host)) {
sb.append("[").append(this.host).append("]");
+ } else if
(InetAddressUtils.isIPv6URLBracketedAddress(this.host)) {
Review comment:
> So colon matches to pct-encoded, but represents an illegal hostname.
Can you expand upon why `%3A%3A1` represents an illegal reg-name and
`%5B%3A%3A1%5D` does not?
The rules defined in RFC 3986 assume either we're parsing a URI from a
string value where it has been correctly encoded, or we have data that we have
already classified as either in ipv4, ipv6, or host identifier. Unfortunately
it doesn't give us much guidance when we have an arbitrary identifier that may
or may not need to be encoded based on whether it's an ipv6 address, or a
hostname, though if the RFC states some hostnames are illegal/reserved for URI
syntax, this may not be an issue (apologies if I have missed something simple).
We could store two values, one string representation of an ip address (v4 or
v6) and a second to represent an unencoded hostname, which may be percent
encoded when we render the URI.
However this introduces a new problem[1], and fails to solve an existing
problem[2]:
1. Using ipv6, the return value of getHost changes from `[::1]` to `::1`.
Consumers are responsible for determining whether the value is an unencoded
hostname that must be percent-encoded, or an ipv6 address, just as we are
attempting here.
2. While the system should work properly parsing from a full URI, the
`setHost` method is still ambiguous and should probably be split into
`setHostIP` and `setHostName` to convey intent. Otherwise we're likely to
attempt to resolve hostnames for inputs that the developer expected to be an IP
address. I'd much rather throw an exception in this case than send data to the
wrong host.
Miscellaneous note: java 14 introduced changes to
`InetSocketAddress.toString` which borrow IPv6 URI encoding outside of the URI
space, so hosts with square braces parsed from the result of
`InetSocketAddress.toString().split(":", 2)[0]` are more likely to be passed to
URIBuilder.setHost.
https://bugs.openjdk.java.net/browse/JDK-8225499
https://hg.openjdk.java.net/jdk/jdk/rev/eb172a3b1c1c
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]