Hi Roger,

Thanks for your thought and I agree with you. Since this is a utility primarily meant for developers, not end users, limiting the "hexadecimal string/character" in Latin-1 seems reasonable.

Naoto

On 11/30/20 7:42 AM, Roger Riggs wrote:
Hi Naoto,

There are a couple of ways consistency can be achieved (and with what).

The existing hex conversions from strings to hex all delegate to Character.digit(ch, radix) which allows both digits and letters beyond Latin1. (See Integer.valueOf(string, radix), Long.valueOf(string, radix), etc.) For conversions from primitive to string they support conversion to the Latin1 characters "0-9", "a-f".

Making the conversion of strings to and from primitives consistent within HexFormat seems attractive but would diverge from existing conversions and typically the non-Latin1 digits and letters almost never appear.

There are uses cases (primarily in protocols and RFCs) where the hexadecimal characters are specifed as "0-9", "a-f", and "A-F".  If HexFormat used Character.digit(string, radix) it would fail to detect unexpected or  illegal characters and render HexFormat unusable for those use cases.

Though it would diverge from consistency with existing parsing of hexadecimal in Character, Integer, Long, etc, I'll post an update to use the string parsing allowing only Latin1 hexadecimal characters.

Comments?

Thanks, Roger



On 11/27/20 5:43 PM, Naoto Sato wrote:
On Fri, 27 Nov 2020 16:57:07 GMT, Roger Riggs <rri...@openjdk.org> wrote:

src/java.base/share/classes/java/util/HexFormat.java line 853:

851:      */
852:     public int fromHexDigit(int ch) {
853:         int value = Character.digit(ch, 16);
Do we need to limit parsing the hex digit for only [0-9a-fA-F]? This would return `0` for other digits, say `fullwidth digit zero` (U+FF10)
The normal and conventional characters for hex encoding are limited to the ASCII/Latin1 range. I don't know of any use case that would take advantage of non-ASCII characters.
My point is that probably we should define `hexadecimal string` more clearly. In the class description, that exclusively means [0-9a-fA-F] in the context of formatting, but in the parsing, it allows non-ASCII digits. e.g.,
HexFormat.of().parseHex("\uff10\uff11")
Succeeds. I would like consistency here.

-------------

PR: https://git.openjdk.java.net/jdk/pull/482

Reply via email to