On Thu, Mar 28, 2024 at 2:10 PM Sergey Chernyshev < [email protected]> wrote:
> Hi Net Dev team, > > I posted this earlier to core-libs-dev mailing list, which was not a > proper place to discuss the net issues. Thanks Alan Bateman for directing > me to the right place. > > I would like to propose a PR to extend the InetAddress API in JDK 23, > namely to provide interface to constructing InetAddress objects from > literal addresses in POSIX/BSD form (please see the discussion [1]), to the > Apps that need to mimic the behavior of POSIX network APIs (inet_addr) > used by standard network utilities such as netcat/curl/wget and the > majority of web browsers. At present time, there's no way to construct > InetAddress object from such literal addresses because the new API > InetAddress.ofLiteral() and Inet4Address.ofLiteral() will consume an > octal address and successfully parse it as decimal, ignoring the octal > prefix. Hence, the resulting object will point to a different IP address > than it is expected to point to. There's also no direct way to create an > InetAddress from a literal address with hexadecimal segments, although this > can be the case in certain systems. > Would this proposal be unique to IPv4 addresses, or is there an equivalent for IPv6? (I would suspect that there isn't, given that the parsing rules for IPv6 are a bit more well-defined...) > Aleksei Efimov contributed JDK-8272215 [2] that adds new factory methods > .ofLiteral() to InetAddress classes. Although the new API is not affected > by the getaddrinfo fallback issue, it is not sufficient for an app that > wants to mimic the behavior of BSD/POSIX network utlilities. In particular, > Java apps that involve parsing or interpreting the parameters of the > standard tools as well as their configuration / environment. > It is suggested to add a new factory method such as .ofPosixLiteral() to > Inet4Address class to fill this gap. This won't introduce ambiguity into > the API and won't break the long standing behavior. As a new method, it > will not affect Java utilities such as HttpClient, nor the existing Java > applications. At the same time, the new method will help dealing with > confusion between BSD and Java standards. > I would suggest normatively calling this behavior "POSIX standard" parsing (not BSD or POSIX/BSD), since it (at least nominally) comes from a standards body [1]. Bear in mind that `inet_pton` follows different rules though [2]. RFC 6943 [3] has a bit more to say about so called "loose" vs "strict" IP address parsing rules. > The parsing algorithm was added as part of JDK-8277608 [3]. It requires > minor modification to produce 4 bytes output (now it doesn't produce any > output). The algorithm allows up to 4 segments splitted by dots (.), the > leading segment(s) must not exceed 255 if there are more than 1 segment, > the trailing segment must not exceed 256ˆ(5 - numberOfSegments) - 1. The > algorithm rejects numbers greater than 0xFF hex, 0377 octal, 255 decimal > per octet. It is different to .ofLiteral() where it is simply 255 per > octet, regardless of leading 0s (the total length must not exceed 15). In > .ofPosixLiteral() there'd be no limit of the number of leading 0s, which > is also the case with inet_addr(). The corner case for both methods are > numbers that are accepted in both, but produce different outputs such as > octal numbers between 010 and 0255. 0256 and above are rejected by > ofLiteral() as well as all hexadecimal numbers. Zero prefixed decimal > numbers such as 0239 should be rejected by ofPosixLiteral(). > > There could be a slight discrepancy in terms of how different standard > tools are working under different OS. For example in MacOS wget & nc > disregard octal prefix (0) while allowing hexadecimal prefix (0x), at the > same time curl & ping process both prefixes. In Ubuntu Server 22.04 both > prefixes are processed, but they are not allowed in /etc/hosts file, while > in MacOS it's legal to use 0x. Despite the deviations in how and where the > BSD standard is implemented, there are two distinct approaches. I don't see > why Java should't provide two different indepentent APIs. It would give the > future apps flexibility to decide which standard to rely on, ability to see > the full picture. > > Please share your thoughts on whether such a change might be desirable in > JDK 23. Thank you for your help! > I guess it could be useful when the need arises to interoperate with tooling that supports this kind of syntax, and if it was done, I would agree that a separate method would be the way to go. But, I don't have any comment as to whether the potential use cases are sufficient to justify the API surface and additional implementation complexity (whatever that may be). As another random data point: the projects I've been working on have relegated such extra-JDK IP address handling tasks to a utility library [4]. We don't have a parser for this particular syntax though. [1] https://pubs.opengroup.org/onlinepubs/009695399/functions/inet_addr.html [2] https://pubs.opengroup.org/onlinepubs/009695399/functions/inet_pton.html [3] https://datatracker.ietf.org/doc/html/rfc6943#section-3.1.1 [4] https://github.com/smallrye/smallrye-common/blob/main/net/src/main/java/io/smallrye/common/net/Inet.java -- - DML • he/him
