Hi Roger,
Thanks for your thought and I agree with you. Since this is a utility
primarily meant for developers, not end users, limiting the "hexadecimal
string/character" in Latin-1 seems reasonable.
Naoto
On 11/30/20 7:42 AM, Roger Riggs wrote:
Hi Naoto,
There are a couple of ways consistency can be achieved (and with what).
The existing hex conversions from strings to hex all delegate to
Character.digit(ch, radix) which allows
both digits and letters beyond Latin1. (See Integer.valueOf(string,
radix), Long.valueOf(string, radix), etc.)
For conversions from primitive to string they support conversion to the
Latin1 characters "0-9", "a-f".
Making the conversion of strings to and from primitives consistent
within HexFormat seems attractive
but would diverge from existing conversions and typically the non-Latin1
digits and letters almost never appear.
There are uses cases (primarily in protocols and RFCs) where the
hexadecimal characters are
specifed as "0-9", "a-f", and "A-F". If HexFormat used
Character.digit(string, radix) it would fail
to detect unexpected or illegal characters and render HexFormat
unusable for those use cases.
Though it would diverge from consistency with existing parsing of
hexadecimal in Character, Integer, Long, etc,
I'll post an update to use the string parsing allowing only Latin1
hexadecimal characters.
Comments?
Thanks, Roger
On 11/27/20 5:43 PM, Naoto Sato wrote:
On Fri, 27 Nov 2020 16:57:07 GMT, Roger Riggs <rri...@openjdk.org> wrote:
src/java.base/share/classes/java/util/HexFormat.java line 853:
851: */
852: public int fromHexDigit(int ch) {
853: int value = Character.digit(ch, 16);
Do we need to limit parsing the hex digit for only [0-9a-fA-F]? This
would return `0` for other digits, say `fullwidth digit zero` (U+FF10)
The normal and conventional characters for hex encoding are limited
to the ASCII/Latin1 range.
I don't know of any use case that would take advantage of non-ASCII
characters.
My point is that probably we should define `hexadecimal string` more
clearly. In the class description, that exclusively means [0-9a-fA-F]
in the context of formatting, but in the parsing, it allows non-ASCII
digits. e.g.,
HexFormat.of().parseHex("\uff10\uff11")
Succeeds. I would like consistency here.
-------------
PR: https://git.openjdk.java.net/jdk/pull/482