carterkozak commented on a change in pull request #274:
URL: 
https://github.com/apache/httpcomponents-core/pull/274#discussion_r602328865



##########
File path: httpcore5/src/main/java/org/apache/hc/core5/net/URIBuilder.java
##########
@@ -292,6 +292,8 @@ private String buildString() {
                 }
                 if (InetAddressUtils.isIPv6Address(this.host)) {
                     sb.append("[").append(this.host).append("]");
+                } else if 
(InetAddressUtils.isIPv6URLBracketedAddress(this.host)) {

Review comment:
       > So colon matches to pct-encoded, but represents an illegal hostname.
   
   Can you expand upon why `%3A%3A1` represents an illegal reg-name and 
`%5B%3A%3A1%5D` does not?
   
   The rules defined in RFC 3986 assume either we're parsing a URI from a 
string value where it has been correctly encoded, or we have data that we have 
already classified as either in ipv4, ipv6, or host identifier. Unfortunately 
it doesn't give us much guidance when we have an arbitrary identifier that may 
or may not need to be encoded based on whether it's an ipv6 address, or a 
hostname, though if the RFC states some hostnames are illegal/reserved for URI 
syntax, this may not be an issue (apologies if I have missed something simple).
   
   We could store two values, one string representation of an ip address (v4 or 
v6) and a second to represent an unencoded hostname, which may be percent 
encoded when we render the URI.
   
   However this introduces a new problem[1], and fails to solve an existing 
problem[2]:
   1. Using ipv6, the return value of getHost changes from `[::1]` to `::1`. 
Consumers are responsible for determining whether the value is an unencoded 
hostname that must be percent-encoded, or an ipv6 address, just as we are 
attempting here.
   2. While the system should work properly parsing from a full URI, the 
`setHost` method is still ambiguous and should probably be split into 
`setHostIP` and `setHostName` to convey intent. Otherwise we're likely to 
attempt to resolve hostnames for inputs that the developer expected to be an IP 
address. I'd much rather throw an exception in this case than send data to the 
wrong host.
   
   Miscellaneous note: java 14 introduced changes to 
`InetSocketAddress.toString` which borrow IPv6 URI encoding outside of the URI 
space, so hosts with square braces parsed from the result of 
`InetSocketAddress.toString().split(":", 2)[0]` are more likely to be passed to 
URIBuilder.setHost.
   https://bugs.openjdk.java.net/browse/JDK-8225499
   https://hg.openjdk.java.net/jdk/jdk/rev/eb172a3b1c1c




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to