On Fri, 15 Sep 2023 18:04:29 GMT, 温绍锦 <[email protected]> wrote:
> In the improvement of @cl4es PR #15591, the advantages of non-lookup-table
> were discussed.
>
> But if the input is byte[], using lookup table can improve performance.
>
> For HexFormat#formatHex(Appendable, byte[]) and HexFormat#formatHex(byte[]),
> If the length of byte[] is larger, the performance of table lookup will be
> improved more obviously.
What numbers do you get with this?
In my experiments the improvement was either negligible or small enough to
hardly matter. I also tried flipping bits similar to what you're doing for
uppercasing but saw a net regression from that.
I opted for simplicity in #15991 - a simple cleanup that removed a few tiny
lookup-tables and still led to a decent improvement. Going the other direction
seems like the wrong move.
Results of an experiment to remove the lookup table use from
`StringBuilder::appendHex`:
-p size=512
Benchmark (size) Mode Cnt Score Error Units
HexFormatBench.appenderLowerCached 512 avgt 15 1,629 ± 0,378 us/op #
baseline
HexFormatBench.appenderLowerCached 512 avgt 15 0,222 ± 0,009 us/op #
15768, 7.3x
HexFormatBench.appenderLowerCached 512 avgt 15 0,366 ± 0,002 us/op #
no_lookup_tables.diff, 4.5x
-p size=4
Benchmark (size) Mode Cnt Score Error Units
HexFormatBench.appenderLowerCached 4 avgt 5 26,723 ± 1,042 ns/op
HexFormatBench.appenderLowerCached 4 avgt 5 7,492 ± 0,140 ns/op #
15768, 3.6x
HexFormatBench.appenderLowerCached 4 avgt 5 10,205 ± 0,055 ns/op #
no_lookup_tables.diff, 2.6x
Most of the win is from having a loop that is easier for the JIT to optimize;
the lookup-table impl is faster than no lookup table in these micros, by a
factor that is in the same ballpark as your `format` micro numbers. This on a
M1. You might get different numbers.
I'm not convinced 30-100% wins on these micro is worth the increase in cache
traffic, and would like to see a deeper analysis of the pros and cons of lookup
tables before we go ahead with any optimizations that depend on their use.
The `appendHex` method in `StringBuilder` is interesting, but has other
maintainability issues. It's a slippery slope to start adding specialized
internal routines to `StringBuilder`, and perhaps there are better ways to
model this at a higher level.
no_lookup_table.diff:
diff --git a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
index 57a16bb769a..70727375c0b 100644
--- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
+++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
@@ -1825,6 +1825,15 @@ private final void appendChars(CharSequence s, int off,
int end) {
count += end - off;
}
+ private static char toHighHexDigit(boolean ucase, int value) {
+ value = (value >> 4) & 0xf;
+ return (char) ((value < 10 ? '0' : ucase ? 'A' : 'a') + value);
+ }
+ private static char toLowHexDigit(boolean ucase, int value) {
+ value = (value) & 0xf;
+ return (char) ((value < 10 ? '0' : ucase ? 'A' : 'a') + value);
+ }
+
final void appendHex(boolean ucase, byte[] bytes, int fromIndex, int
toIndex) {
Preconditions.checkFromToIndex(fromIndex, toIndex, bytes.length,
Preconditions.IOOBE_FORMATTER);
@@ -1832,17 +1841,14 @@ final void appendHex(boolean ucase, byte[] bytes, int
fromIndex, int toIndex) {
int charPos = count;
for (int i = fromIndex; i < toIndex; ++i) {
- short packed = HexDigits.digitPair(bytes[i], ucase);
+ byte b = bytes[i];
if (isLatin1()) {
- ByteArrayLittleEndian.setShort(
- value,
- charPos,
- packed);
+ StringLatin1.putChar(value, charPos++, toHighHexDigit(ucase,
b));
+ StringLatin1.putChar(value, charPos++, toLowHexDigit(ucase,
b));
} else {
- StringUTF16.putChar(value, charPos, packed & 0xff);
- StringUTF16.putChar(value, charPos + 1, packed >> 8);
+ StringUTF16.putChar(value, charPos++, toHighHexDigit(ucase,
b));
+ StringUTF16.putChar(value, charPos++, toLowHexDigit(ucase, b));
}
- charPos += 2;
}
count = charPos;
I don't think `appendHex` would make the cut as a public API. I think we would
need something with much more general utility for that.
A more high-level idea would be to improve said `String` templates JEP to be
able to format more generally into `Appendable`/`StringBuilder`, so that hex,
octal - any custom format - can be efficiently appended to a `StringBuilder`.
BTW, here's a patch on top of #15768 that simplifies `formatHex(Appendable..)`
to simply append the result of `formatHex(byte[]...)`:
Benchmark (size) Mode Cnt
Score Error Units
HexFormatBench.appenderLowerCached 512 avgt 15
0,202 ± 0,001 us/op
HexFormatBench.formatLowerCached 512 avgt 15
0,189 ± 0,016 us/op
Same speed. Drawback is that the `appender` becomes allocating but short-lived
allocations is one of my least concerns:
```java
diff --git a/src/java.base/share/classes/java/util/HexFormat.java
b/src/java.base/share/classes/java/util/HexFormat.java
index e1d88e3ceb6..68a15307fe0 100644
--- a/src/java.base/share/classes/java/util/HexFormat.java
+++ b/src/java.base/share/classes/java/util/HexFormat.java
@@ -407,38 +407,7 @@ public <A extends Appendable> A formatHex(A out, byte[]
bytes, int fromIndex, in
int length = toIndex - fromIndex;
if (length > 0) {
try {
- boolean prefixEmpty = prefix.isEmpty();
- boolean suffixEmpty = suffix.isEmpty();
- boolean delimiterEmpty = delimiter.isEmpty();
- if (!prefixEmpty) {
- out.append(prefix);
- }
- if (suffixEmpty && delimiterEmpty && prefixEmpty) {
- if (out instanceof StringBuilder sb) {
- jla.appendHex(sb, digitCase == Case.UPPERCASE, bytes,
fromIndex, toIndex);
- } else {
- for (int i = 0; i < length; i++) {
- toHexDigits(out, bytes[fromIndex + i]);
- }
- }
- } else {
- toHexDigits(out, bytes[fromIndex]);
- for (int i = 1; i < length; i++) {
- if (!suffixEmpty) {
- out.append(suffix);
- }
- if (!delimiterEmpty) {
- out.append(delimiter);
- }
- if (!prefixEmpty) {
- out.append(prefix);
- }
- toHexDigits(out, bytes[fromIndex + i]);
- }
- }
- if (!suffixEmpty) {
- out.append(suffix);
- }
+ out.append(formatHex(bytes, fromIndex, toIndex));
} catch (IOException ioe) {
throw new UncheckedIOException(ioe.getMessage(), ioe);
}
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721672736
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721851723
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1722048682