On Wed, 17 Apr 2024 22:23:41 GMT, Sergey Chernyshev <schernys...@openjdk.org> 
wrote:

>> There are two distinct approaches to parsing IPv4 literal addresses. One is 
>> the Java baseline "strict" syntax (all-decimal d.d.d.d form family), another 
>> one is the "loose" syntax of RFC 6943 section 3.1.1 [1] (POSIX `inet_addr` 
>> allowing octal and hexadecimal forms [2]). The goal of this PR is to provide 
>> interface to construct InetAddress objects from literal addresses in POSIX 
>> form, to applications that need to mimic the behavior of `inet_addr` used by 
>> standard network utilities such as netcat/curl/wget and the majority of web 
>> browsers. At present time, there's no way to construct `InetAddress` object 
>> from such literal addresses because the existing APIs such as 
>> `InetAddress.getByName()`, `InetAddress#ofLiteral()` and 
>> `Inet4Address#ofLiteral()` will consume an address and successfully parse it 
>> as decimal, regardless of the octal prefix. Hence, the resulting object will 
>> point to a different IP address.
>> 
>> Historically `InetAddress.getByName()/.getAllByName()` were the only way to 
>> convert a literal address into an InetAddress object. `getAllByName()` API 
>> relies on POSIX `getaddrinfo` / `inet_addr` which parses IP address segments 
>> with `strtoul` (accepts octal and hexadecimal bases).
>> 
>> The fallback to `getaddrinfo` is undesirable as it may end up with network 
>> queries (blocking mode), if `inet_addr` rejects the input literal address. 
>> The Java standard explicitly says that
>> 
>> "If a literal IP address is supplied, only the validity of the address 
>> format is checked."
>> 
>> @AlekseiEfimov contributed JDK-8272215 [3] that adds new factory methods 
>> `.ofLiteral()` to `InetAddress` classes. Although the new API is not 
>> affected by the `getaddrinfo` fallback issue, it is not sufficient for an 
>> application that needs to interoperate with external tooling that follows 
>> POSIX standard. In the current state, `InetAddress#ofLiteral()` and 
>> `Inet4Address#ofLiteral()` will consume the input literal address and 
>> (regardless of the octal prefix) parse it as decimal numbers. Hence, it's 
>> not possible to reliably construct an `InetAddress` object from a literal 
>> address in POSIX form that would point to the desired host.
>> 
>> It is proposed to extend the factory methods with 
>> `Inet4Address#ofPosixLiteral()` that allows parsing literal IP(v4) addresses 
>> in "loose" syntax, compatible with `inet_addr` POSIX api. The implementation 
>> is based on `.isBsdParsableV4()` method added along with JDK-8277608 [4]. 
>> The changes in the original algorithm are as follows:
>> 
>> - `IPAddressUtil#parseB...
>
> Sergey Chernyshev has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   addressed more review comments

The sentence at line 73 (of Inet4Address) isn't correct any more.

"These forms support parts specified in decimal format only."

Forms here refers to the number of components in the address, not the methods 
used to parse the address. The new method also supports the multiple "forms" of 
an address.

I think it might be best to have a new section in the class doc

"Parsing of literal addresses"

which lists the methods that parse as decimal only, and the new method which 
parses using the "loose" syntax. Then the existing snippet showing examples of 
parsing as decimal only can be shown. The syntax for loose parsing should 
remain in the method definition imo.

-------------

PR Review: https://git.openjdk.org/jdk/pull/18493#pullrequestreview-2008458196

Reply via email to