On Wed, 6 Sep 2023 13:36:22 GMT, Claes Redestad <[email protected]> wrote:
> This PR seeks to improve formatting of hex digits using `java.util.HexFormat`
> somewhat.
>
> This is achieved getting rid of a couple of lookup tables, caching the result
> of `HexFormat.of().withUpperCase()`, and removing tiny allocation that
> happens in the `formatHex(A, byte)` method. Improvements range from 20-40% on
> throughput, and some operations allocate less:
>
>
> Name Cnt Base Error Test Error Unit
> Diff%
> HexFormatBench.appenderLower 15 1,330 ± 0,021 1,065 ± 0,067 us/op
> 19,9% (p = 0,000*)
> :gc.alloc.rate 15 11,481 ± 0,185 0,007 ± 0,000 MB/sec
> -99,9% (p = 0,000*)
> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op
> -100,0% (p = 0,000*)
> :gc.count 15 3,000 0,000 counts
> :gc.time 3 2,000 ms
> HexFormatBench.appenderLowerCached 15 1,317 ± 0,013 1,065 ± 0,054 us/op
> 19,1% (p = 0,000*)
> :gc.alloc.rate 15 11,590 ± 0,111 0,007 ± 0,000 MB/sec
> -99,9% (p = 0,000*)
> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op
> -100,0% (p = 0,000*)
> :gc.count 15 3,000 0,000 counts
> :gc.time 3 2,000 ms
> HexFormatBench.appenderUpper 15 1,330 ± 0,022 1,065 ± 0,036 us/op
> 19,9% (p = 0,000*)
> :gc.alloc.rate 15 34,416 ± 0,559 0,007 ± 0,000 MB/sec
> -100,0% (p = 0,000*)
> :gc.alloc.rate.norm 15 48,009 ± 0,000 0,007 ± 0,000 B/op
> -100,0% (p = 0,000*)
> :gc.count 15 0,000 0,000 counts
> HexFormatBench.appenderUpperCached 15 1,353 ± 0,009 1,033 ± 0,014 us/op
> 23,6% (p = 0,000*)
> :gc.alloc.rate 15 11,284 ± 0,074 0,007 ± 0,000 MB/sec
> -99,9% (p = 0,000*)
> :gc.alloc.rate.norm 15 16,009 ± 0,000 0,007 ± 0,000 B/op
> -100,0% (p = 0,000*)
> :gc.count 15 3,000 0,000 counts
> :gc.time 3 2,000 ms
> HexFormatBench.toHexLower 15 0,198 ± 0,001 0,119 ± 0,008 us/op
> 40,1% (p = 0,000*)
> :gc.alloc.rate 15 0,007 ± 0,000 0,007 ± 0,000 MB/sec
> -0,0% (p = 0,816 )
> :gc.alloc.rate.norm 15 0,001 ± 0,000 0,001 ± 0,000 B/op
> -40,1% (p = 0,000*)
> :gc.count 15 0,000 0,000 ...
I also tried variants of this for `toHexDigits(short|int|long)` but it always
ends up worse. Both when chunking 2 digits at a time:
ByteArrayLittleEndian.setChar(rep, 2, (char)(toHighHexDigit(value) << 8
| toLowHexDigit(value)));
value >>= 8;
ByteArrayLittleEndian.setChar(rep, 0, (char)(toHighHexDigit(value) << 8
| toLowHexDigit(value)));
and when doing it all in one go (this code is `toHexDigits(short)`, which
writes four nibbles):
ByteArrayLittleEndian.setInt(rep, 0, toHighHexDigit(value >> 8) << 24 |
toLowHexDigit(value >> 8) << 16 | toHighHexDigit(value) << 8 |
toLowHexDigit(value)));
Only a win on `toHexDigits(byte)`:
Name Cnt Base Error Test Error Unit Diff%
HexFormatBench.toHexDigitsByte 15 1,992 ± 0,008 1,908 ± 0,053 us/op 4,2%
(p = 0,000*)
HexFormatBench.toHexDigitsInt 15 2,476 ± 0,018 2,896 ± 0,023 us/op -16,9%
(p = 0,000*)
HexFormatBench.toHexDigitsLong 15 3,992 ± 0,052 4,229 ± 0,036 us/op -5,9%
(p = 0,000*)
HexFormatBench.toHexDigitsShort 15 2,183 ± 0,018 2,800 ± 0,162 us/op -28,3%
(p = 0,000*)
* = significant
This is indicative that any win here comes from tickling the JIT the right way,
rather than some intrinsic property of `ByteArrayLittleEndian`. I'll leave the
code unchanged but add these microbenchmarks if anyone wants to attempt other
improvements.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/15591#issuecomment-1711436585