Hi,
I think this could be a useful addition. The apidoc would need to be
clear about the differences from the existing literal parsing method and
any implications for ambiguity between octal and decimal formats spelled
out.
- Michael
On 28/03/2024 19:09, Sergey Chernyshev wrote:
Hi Net Dev team,
I posted this earlier to core-libs-dev mailing list, which was not a
proper place to discuss the net issues. Thanks Alan Bateman for
directing me to the right place.
I would like to propose a PR to extend the InetAddress API in JDK 23,
namely to provide interface to constructing InetAddress objects from
literal addresses in POSIX/BSD form (please see the discussion [1]),
to the Apps that need to mimic the behavior of POSIX network APIs
(|inet_addr|) used by standard network utilities such as
netcat/curl/wget and the majority of web browsers. At present time,
there's no way to construct |InetAddress| object from such literal
addresses because the new API |InetAddress.ofLiteral()| and
|Inet4Address.ofLiteral()| will consume an octal address and
successfully parse it as decimal, ignoring the octal prefix. Hence,
the resulting object will point to a different IP address than it is
expected to point to. There's also no direct way to create an
InetAddress from a literal address with hexadecimal segments, although
this can be the case in certain systems.
Historically |InetAddress.getByName()/.getAllByName()| were the only
way to convert a literal address into an InetAddress object.
|getAllByName()| API relies on POSIX |getaddrinfo| / |inet_addr| which
parses IP address segments with |strtoul| (accepts octal and
hexadecimal bases). The fallback to |getaddrinfo| is undesirable as it
may end up with network queries (blocking mode), if |inet_aton|
rejects the input literal address. The Java standard explicitly says that
|"If a literal IP address is supplied, only the validity of the
address format is checked." |
Aleksei Efimov contributed JDK-8272215 [2] that adds new factory
methods |.ofLiteral()| to |InetAddress| classes. Although the new API
is not affected by the |getaddrinfo| fallback issue, it is not
sufficient for an app that wants to mimic the behavior of BSD/POSIX
network utlilities. In particular, Java apps that involve parsing or
interpreting the parameters of the standard tools as well as their
configuration / environment.
It is suggested to add a new factory method such as
|.ofPosixLiteral()| to |Inet4Address| class to fill this gap. This
won't introduce ambiguity into the API and won't break the long
standing behavior. As a new method, it will not affect Java utilities
such as HttpClient, nor the existing Java applications. At the same
time, the new method will help dealing with confusion between BSD and
Java standards.
The parsing algorithm was added as part of JDK-8277608 [3]. It
requires minor modification to produce 4 bytes output (now it doesn't
produce any output). The algorithm allows up to 4 segments splitted by
dots (.), the leading segment(s) must not exceed 255 if there are more
than 1 segment, the trailing segment must not exceed 256ˆ(5 -
numberOfSegments) - 1. The algorithm rejects numbers greater than 0xFF
hex, 0377 octal, 255 decimal per octet. It is different to
.ofLiteral() where it is simply 255 per octet, regardless of leading
0s (the total length must not exceed 15). In .ofPosixLiteral() there'd
be no limit of the number of leading 0s, which is also the case with
inet_addr(). The corner case for both methods are numbers that are
accepted in both, but produce different outputs such as octal numbers
between 010 and 0255. 0256 and above are rejected by ofLiteral() as
well as all hexadecimal numbers. Zero prefixed decimal numbers such as
0239 should be rejected by ofPosixLiteral().
There could be a slight discrepancy in terms of how different standard
tools are working under different OS. For example in MacOS wget & nc
disregard octal prefix (0) while allowing hexadecimal prefix (0x), at
the same time curl & ping process both prefixes. In Ubuntu Server
22.04 both prefixes are processed, but they are not allowed in
/etc/hosts file, while in MacOS it's legal to use 0x. Despite the
deviations in how and where the BSD standard is implemented, there are
two distinct approaches. I don't see why Java should't provide two
different indepentent APIs. It would give the future apps flexibility
to decide which standard to rely on, ability to see the full picture.
Please share your thoughts on whether such a change might be desirable
in JDK 23. Thank you for your help!
Best regards
Sergey Chernyshev
[1] https://bugs.openjdk.org/browse/JDK-8315767
[2] https://bugs.openjdk.org/browse/JDK-8272215
[3] https://github.com/openjdk/jdk/commit/cdc1582