michael-o commented on a change in pull request #274:
URL:
https://github.com/apache/httpcomponents-core/pull/274#discussion_r602789026
##########
File path: httpcore5/src/main/java/org/apache/hc/core5/net/URIBuilder.java
##########
@@ -292,6 +292,8 @@ private String buildString() {
}
if (InetAddressUtils.isIPv6Address(this.host)) {
sb.append("[").append(this.host).append("]");
+ } else if
(InetAddressUtils.isIPv6URLBracketedAddress(this.host)) {
Review comment:
> Can you expand upon why %3A%3A1 represents an illegal reg-name and
%5B%3A%3A1%5D does not?
The RFC says
host = IP-literal / IPv4address / reg-name
The syntax rule for host is ambiguous because it does not completely
distinguish between an IPv4address and a reg-name.
I consider both inputs you have provided to be semantically (not
syntactically) illegal. The inputs you have provided do neither match to
`IP-literal / IPv4address`. So, it must match `reg-name`, domain names have a
fixed syntax. After decoding it is obvious that neither `::1` nor `[::1]` are
valid domain names. They are syntactically valid `reg-names`. See
[here](https://tools.ietf.org/html/rfc1034#section-3.5). The input is highly
ambiguous.
This is an interesting example:
```
$ ifconfig | grep inet6
inet6 fe80::yyy:9466%em0 prefixlen 64 scopeid 0x1
inet6 2003:xxx:fed2:9466 prefixlen 64 autoconf
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
$ python3
Python 3.7.10 (default, Mar 13 2021, 20:44:50)
[Clang 10.0.1 ([email protected]:llvm/llvm-project.git
llvmorg-10.0.1-0-gef32c611a on freebsd12
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.gethostbyname('::1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('127.0.0.1')
'127.0.0.1'
>>> socket.gethostbyname('[::1]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('fe80::yyy:9466')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('[fe80::yyy:9466]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('fe80::yyy:9466%em0')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('[fe80::yyy:9466%em0]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('[2003:xxx:fed2:9466]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('2003:xxx:fed2:9466')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('fe80::1%lo0')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
>>> socket.gethostbyname('[fe80::1%lo0]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
socket.gaierror: [Errno 8] Name does not resolve
```
routing table looks good:
```
$ netstat -rn6
Routing tables
Internet6:
Destination Gateway Flags
Netif Expire
::/96 ::1 UGRS
lo0
default fe80::yyy:2772%em0 UG
em0
::1 link#2 UH
lo0
::ffff:0.0.0.0/96 ::1 UGRS
lo0
xxx::/64 link#1 U
em0
xxx:219:99ff:fed2:9466 link#1 UHS
lo0
fe80::/10 ::1 UGRS
lo0
fe80::%em0/64 link#1 U
em0
fe80::219:99ff:fed2:9466%em0 link#1 UHS
lo0
fe80::%lo0/64 link#2 U
lo0
fe80::1%lo0 link#2 UHS
lo0
ff02::/16 ::1 UGRS
lo0
```
But I guess we cannot really restrict non-valid domain names and we
shouldn't. Garbage in and garbage out.
> We could store two values, one string representation of an ip address (v4
or v6) and a second to represent an unencoded hostname, which may be percent
encoded when we render the URI.
Yes, internally we could (should) work we raw values, but RFC 3986 forces us
to encode to help parsers. A technical detail. A `URIBuilder` is an abstraction
from the actual syntax, after all.
Both of your questions are perfectly valid and basically boil down to the
question: Shall a URI builder operate (accept) on raw (unencoded) input and
hide away all encoding and with getters return raw values or require encoded
value already? If it requires pre-encoded values, I consider a builder
pointless because it would do mere string concat. I want to supply raw values
and have the builder to do the hard work for RFC 3986.
Now let's get to the JDK changes: Oracle is, again, embarassing here citing
RFC 2732. This one is dead and obsolete. The basic problem they needed to solve
is the ambiguity with socket addresses: inet address with ports because IPv6
literals use colons just like host/port separators. The reason why they
"abused" RFC 3986 also there is no URI here is because they did not want to
make up their own syntax and reuse an already established one. Note that many
other software uses the bracket notation to clearly separate IPv6 from the
port. Otherwise how to interprete this inet socket address: `::1`. Is it `::`
(any) on port 1 or just `::1` (localhost) with any port information?
JDK-8225499 is a good proof of Oracle's inability to solve problems in time.
They should have avoided this with the inception of `Inet6Address` and not 10
years later. Pure BSD/POSIX sockets don't suffer from this issue:
```
$ python3
Python 3.7.10 (default, Mar 13 2021, 20:44:50)
[Clang 10.0.1 ([email protected]:llvm/llvm-project.git
llvmorg-10.0.1-0-gef32c611a on freebsd12
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> sock = socket.socket(socket.AF_INET6)
>>> sock
<socket.socket fd=3, family=AddressFamily.AF_INET6,
type=SocketKind.SOCK_STREAM, proto=0, laddr=('::', 0, 0, 0)>
>>> sock.bind(('::1', 20000))
>>> sock
<socket.socket fd=3, family=AddressFamily.AF_INET6,
type=SocketKind.SOCK_STREAM, proto=0, laddr=('::1', 20000, 0, 0)>
>>> sock.listen(1)
>>> conn, addr = sock.accept() # running `socat(1)` in another terminal
>>> addr
('::1', 44087, 0, 0)
```
```
$ echo "HELLO" | socat - 'TCP:[::1]:20000'
mosipov@bsd1srv:/usr/home/mosipov
$ echo "HELLO" | socat - 'TCP:::1:20000'
2021/03/27 21:31:41 socat[61890] E TCP: wrong number of parameters (4
instead of 2)
```
Brackets are just a technical necessity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]