michael-o commented on a change in pull request #274:
URL: 
https://github.com/apache/httpcomponents-core/pull/274#discussion_r602789026



##########
File path: httpcore5/src/main/java/org/apache/hc/core5/net/URIBuilder.java
##########
@@ -292,6 +292,8 @@ private String buildString() {
                 }
                 if (InetAddressUtils.isIPv6Address(this.host)) {
                     sb.append("[").append(this.host).append("]");
+                } else if 
(InetAddressUtils.isIPv6URLBracketedAddress(this.host)) {

Review comment:
       > Can you expand upon why %3A%3A1 represents an illegal reg-name and 
%5B%3A%3A1%5D does not?
   
   The RFC says
   
         host        = IP-literal / IPv4address / reg-name
        The syntax rule for host is ambiguous because it does not completely
        distinguish between an IPv4address and a reg-name. 
   
   I consider both inputs you have provided to be semantically (not 
syntactically) illegal. The inputs you have provided do neither match to 
`IP-literal / IPv4address`. So, it must match `reg-name`, domain names have a 
fixed syntax. After decoding it is obvious that neither `::1` nor `[::1]` are 
valid domain names. They are syntactically valid `reg-names`. See 
[here](https://tools.ietf.org/html/rfc1034#section-3.5). The input is highly 
ambiguous.
   
   This is an interesting example:
   ```
   $ ifconfig | grep inet6
           inet6 fe80::yyy:9466%em0 prefixlen 64 scopeid 0x1
           inet6 2003:xxx:fed2:9466 prefixlen 64 autoconf
           inet6 ::1 prefixlen 128
           inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
   $ python3
   Python 3.7.10 (default, Mar 13 2021, 20:44:50)
   [Clang 10.0.1 ([email protected]:llvm/llvm-project.git 
llvmorg-10.0.1-0-gef32c611a on freebsd12
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import socket
   >>> socket.gethostbyname('::1')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('127.0.0.1')
   '127.0.0.1'
   >>> socket.gethostbyname('[::1]')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('fe80::yyy:9466')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('[fe80::yyy:9466]')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('fe80::yyy:9466%em0')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('[fe80::yyy:9466%em0]')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('[2003:xxx:fed2:9466]')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('2003:xxx:fed2:9466')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('fe80::1%lo0')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   >>> socket.gethostbyname('[fe80::1%lo0]')
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
   socket.gaierror: [Errno 8] Name does not resolve
   ```
   
   routing table looks good:
   ```
   $ netstat -rn6
   Routing tables
   
   Internet6:
   Destination                       Gateway                       Flags     
Netif Expire
   ::/96                             ::1                           UGRS        
lo0
   default                           fe80::yyy:2772%em0            UG          
em0
   ::1                               link#2                        UH          
lo0
   ::ffff:0.0.0.0/96                 ::1                           UGRS        
lo0
   xxx::/64                          link#1                        U           
em0
   xxx:219:99ff:fed2:9466            link#1                        UHS         
lo0
   fe80::/10                         ::1                           UGRS        
lo0
   fe80::%em0/64                     link#1                        U           
em0
   fe80::219:99ff:fed2:9466%em0      link#1                        UHS         
lo0
   fe80::%lo0/64                     link#2                        U           
lo0
   fe80::1%lo0                       link#2                        UHS         
lo0
   ff02::/16                         ::1                           UGRS        
lo0
   ```
   
   But I guess we cannot really restrict non-valid domain names and we 
shouldn't. Garbage in and garbage out.
   
   > We could store two values, one string representation of an ip address (v4 
or v6) and a second to represent an unencoded hostname, which may be percent 
encoded when we render the URI.
   
   Yes, internally we could (should) work we raw values, but RFC 3986 forces us 
to encode to help parsers. A technical detail. A `URIBuilder` is an abstraction 
from the actual syntax, after all.
   
   Both of your questions are perfectly valid and basically boil down to the 
question: Shall a URI builder operate (accept) on raw (unencoded) input and 
hide away all encoding and with getters return raw values or require encoded 
value already? If it requires pre-encoded values, I consider a builder 
pointless because it would do mere string concat. I want to supply raw values 
and have the builder to do the hard work for RFC 3986.
   
   Now let's get to the JDK changes: Oracle is, again, embarassing here citing 
RFC 2732. This one is dead and obsolete. The basic problem they needed to solve 
is the ambiguity with socket addresses: inet address with ports because IPv6 
literals use colons  just like host/port separators. The reason why they 
"abused" RFC 3986 also there is no URI here is because they did not want to 
make up their own syntax and reuse an already established one. Note that many 
other software uses the bracket notation to clearly separate IPv6 from the 
port. Otherwise how to interprete this inet socket address: `::1`. Is it `::` 
(any) on port 1 or just `::1` (localhost) with any port information?
   
   JDK-8225499 is a good proof of Oracle's inability to solve problems in time. 
They should have avoided this with the inception of `Inet6Address` and not 10 
years later. Pure BSD/POSIX sockets don't suffer from this issue:
   
   ```
   $ python3
   Python 3.7.10 (default, Mar 13 2021, 20:44:50)
   [Clang 10.0.1 ([email protected]:llvm/llvm-project.git 
llvmorg-10.0.1-0-gef32c611a on freebsd12
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import socket
   >>> sock = socket.socket(socket.AF_INET6)
   >>> sock
   <socket.socket fd=3, family=AddressFamily.AF_INET6, 
type=SocketKind.SOCK_STREAM, proto=0, laddr=('::', 0, 0, 0)>
   >>> sock.bind(('::1', 20000))
   >>> sock
   <socket.socket fd=3, family=AddressFamily.AF_INET6, 
type=SocketKind.SOCK_STREAM, proto=0, laddr=('::1', 20000, 0, 0)>
   >>> sock.listen(1)
   >>> conn, addr = sock.accept() # running `socat(1)`  in another terminal
   >>> addr
   ('::1', 44087, 0, 0)
   ```
   
   ```
   $ echo "HELLO" | socat - 'TCP:[::1]:20000'
   mosipov@bsd1srv:/usr/home/mosipov
   $ echo "HELLO" | socat - 'TCP:::1:20000'
   2021/03/27 21:31:41 socat[61890] E TCP: wrong number of parameters (4 
instead of 2)
   ```
   
   Brackets are just a technical necessity.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to