On Fri, 15 Sep 2023 18:04:29 GMT, 温绍锦 <d...@openjdk.org> wrote:
> In the improvement of @cl4es PR #15591, the advantages of non-lookup-table > were discussed. > > But if the input is byte[], using lookup table can improve performance. > > For HexFormat#formatHex(Appendable, byte[]) and HexFormat#formatHex(byte[]), > If the length of byte[] is larger, the performance of table lookup will be > improved more obviously. What numbers do you get with this? In my experiments the improvement was either negligible or small enough to hardly matter. I also tried flipping bits similar to what you're doing for uppercasing but saw a net regression from that. I opted for simplicity in #15991 - a simple cleanup that removed a few tiny lookup-tables and still led to a decent improvement. Going the other direction seems like the wrong move. Results of an experiment to remove the lookup table use from `StringBuilder::appendHex`: -p size=512 Benchmark (size) Mode Cnt Score Error Units HexFormatBench.appenderLowerCached 512 avgt 15 1,629 ± 0,378 us/op # baseline HexFormatBench.appenderLowerCached 512 avgt 15 0,222 ± 0,009 us/op # 15768, 7.3x HexFormatBench.appenderLowerCached 512 avgt 15 0,366 ± 0,002 us/op # no_lookup_tables.diff, 4.5x -p size=4 Benchmark (size) Mode Cnt Score Error Units HexFormatBench.appenderLowerCached 4 avgt 5 26,723 ± 1,042 ns/op HexFormatBench.appenderLowerCached 4 avgt 5 7,492 ± 0,140 ns/op # 15768, 3.6x HexFormatBench.appenderLowerCached 4 avgt 5 10,205 ± 0,055 ns/op # no_lookup_tables.diff, 2.6x Most of the win is from having a loop that is easier for the JIT to optimize; the lookup-table impl is faster than no lookup table in these micros, by a factor that is in the same ballpark as your `format` micro numbers. This on a M1. You might get different numbers. I'm not convinced 30-100% wins on these micro is worth the increase in cache traffic, and would like to see a deeper analysis of the pros and cons of lookup tables before we go ahead with any optimizations that depend on their use. The `appendHex` method in `StringBuilder` is interesting, but has other maintainability issues. It's a slippery slope to start adding specialized internal routines to `StringBuilder`, and perhaps there are better ways to model this at a higher level. no_lookup_table.diff: diff --git a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java index 57a16bb769a..70727375c0b 100644 --- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java +++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java @@ -1825,6 +1825,15 @@ private final void appendChars(CharSequence s, int off, int end) { count += end - off; } + private static char toHighHexDigit(boolean ucase, int value) { + value = (value >> 4) & 0xf; + return (char) ((value < 10 ? '0' : ucase ? 'A' : 'a') + value); + } + private static char toLowHexDigit(boolean ucase, int value) { + value = (value) & 0xf; + return (char) ((value < 10 ? '0' : ucase ? 'A' : 'a') + value); + } + final void appendHex(boolean ucase, byte[] bytes, int fromIndex, int toIndex) { Preconditions.checkFromToIndex(fromIndex, toIndex, bytes.length, Preconditions.IOOBE_FORMATTER); @@ -1832,17 +1841,14 @@ final void appendHex(boolean ucase, byte[] bytes, int fromIndex, int toIndex) { int charPos = count; for (int i = fromIndex; i < toIndex; ++i) { - short packed = HexDigits.digitPair(bytes[i], ucase); + byte b = bytes[i]; if (isLatin1()) { - ByteArrayLittleEndian.setShort( - value, - charPos, - packed); + StringLatin1.putChar(value, charPos++, toHighHexDigit(ucase, b)); + StringLatin1.putChar(value, charPos++, toLowHexDigit(ucase, b)); } else { - StringUTF16.putChar(value, charPos, packed & 0xff); - StringUTF16.putChar(value, charPos + 1, packed >> 8); + StringUTF16.putChar(value, charPos++, toHighHexDigit(ucase, b)); + StringUTF16.putChar(value, charPos++, toLowHexDigit(ucase, b)); } - charPos += 2; } count = charPos; I don't think `appendHex` would make the cut as a public API. I think we would need something with much more general utility for that. A more high-level idea would be to improve said `String` templates JEP to be able to format more generally into `Appendable`/`StringBuilder`, so that hex, octal - any custom format - can be efficiently appended to a `StringBuilder`. BTW, here's a patch on top of #15768 that simplifies `formatHex(Appendable..)` to simply append the result of `formatHex(byte[]...)`: Benchmark (size) Mode Cnt Score Error Units HexFormatBench.appenderLowerCached 512 avgt 15 0,202 ± 0,001 us/op HexFormatBench.formatLowerCached 512 avgt 15 0,189 ± 0,016 us/op Same speed. Drawback is that the `appender` becomes allocating but short-lived allocations is one of my least concerns: ```java diff --git a/src/java.base/share/classes/java/util/HexFormat.java b/src/java.base/share/classes/java/util/HexFormat.java index e1d88e3ceb6..68a15307fe0 100644 --- a/src/java.base/share/classes/java/util/HexFormat.java +++ b/src/java.base/share/classes/java/util/HexFormat.java @@ -407,38 +407,7 @@ public <A extends Appendable> A formatHex(A out, byte[] bytes, int fromIndex, in int length = toIndex - fromIndex; if (length > 0) { try { - boolean prefixEmpty = prefix.isEmpty(); - boolean suffixEmpty = suffix.isEmpty(); - boolean delimiterEmpty = delimiter.isEmpty(); - if (!prefixEmpty) { - out.append(prefix); - } - if (suffixEmpty && delimiterEmpty && prefixEmpty) { - if (out instanceof StringBuilder sb) { - jla.appendHex(sb, digitCase == Case.UPPERCASE, bytes, fromIndex, toIndex); - } else { - for (int i = 0; i < length; i++) { - toHexDigits(out, bytes[fromIndex + i]); - } - } - } else { - toHexDigits(out, bytes[fromIndex]); - for (int i = 1; i < length; i++) { - if (!suffixEmpty) { - out.append(suffix); - } - if (!delimiterEmpty) { - out.append(delimiter); - } - if (!prefixEmpty) { - out.append(prefix); - } - toHexDigits(out, bytes[fromIndex + i]); - } - } - if (!suffixEmpty) { - out.append(suffix); - } + out.append(formatHex(bytes, fromIndex, toIndex)); } catch (IOException ioe) { throw new UncheckedIOException(ioe.getMessage(), ioe); } ------------- PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721672736 PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721851723 PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1722048682