Re: RFR: 8316426: Optimization for HexFormat.formatHex

Claes Redestad Mon, 18 Sep 2023 08:46:02 -0700

On Fri, 15 Sep 2023 18:04:29 GMT, 温绍锦 <d...@openjdk.org> wrote:


> In the improvement of @cl4es PR #15591, the advantages of non-lookup-table 
> were discussed.
> 
> But if the input is byte[], using lookup table can improve performance.
> 
> For HexFormat#formatHex(Appendable, byte[]) and HexFormat#formatHex(byte[]), 
> If the length of byte[] is larger, the performance of table lookup will be 
> improved more obviously.

What numbers do you get with this? 

In my experiments the improvement was either negligible or small enough to 
hardly matter. I also tried flipping bits similar to what you're doing for 
uppercasing but saw a net regression from that.

I opted for simplicity in #15991 - a simple cleanup that removed a few tiny 
lookup-tables and still led to a decent improvement. Going the other direction 
seems like the wrong move.

Results of an experiment to remove the lookup table use from 
`StringBuilder::appendHex`:

-p size=512
Benchmark                           (size)  Mode  Cnt  Score   Error  Units
HexFormatBench.appenderLowerCached     512  avgt   15  1,629 ± 0,378  us/op # 
baseline
HexFormatBench.appenderLowerCached     512  avgt   15  0,222 ± 0,009  us/op # 
15768, 7.3x
HexFormatBench.appenderLowerCached     512  avgt   15  0,366 ± 0,002  us/op # 
no_lookup_tables.diff, 4.5x

-p size=4
Benchmark                           (size)  Mode  Cnt   Score   Error  Units
HexFormatBench.appenderLowerCached       4  avgt    5  26,723 ± 1,042  ns/op
HexFormatBench.appenderLowerCached       4  avgt    5   7,492 ± 0,140  ns/op # 
15768, 3.6x
HexFormatBench.appenderLowerCached       4  avgt    5  10,205 ± 0,055  ns/op # 
no_lookup_tables.diff, 2.6x


Most of the win is from having a loop that is easier for the JIT to optimize; 
the lookup-table impl is faster than no lookup table in these micros, by a 
factor that is in the same ballpark as your `format` micro numbers. This on a 
M1. You might get different numbers.

I'm not convinced 30-100% wins on these micro is worth the increase in cache 
traffic, and would like to see a deeper analysis of the pros and cons of lookup 
tables before we go ahead with any optimizations that depend on their use.

The `appendHex` method in `StringBuilder` is interesting, but has other 
maintainability issues. It's a slippery slope to start adding specialized 
internal routines to `StringBuilder`, and perhaps there are better ways to 
model this at a higher level.

no_lookup_table.diff:

diff --git a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java 
b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
index 57a16bb769a..70727375c0b 100644
--- a/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
+++ b/src/java.base/share/classes/java/lang/AbstractStringBuilder.java
@@ -1825,6 +1825,15 @@ private final void appendChars(CharSequence s, int off, 
int end) {
         count += end - off;
     }

+    private static char toHighHexDigit(boolean ucase, int value) {
+        value = (value >> 4) & 0xf;
+        return (char) ((value < 10 ? '0' : ucase ? 'A' : 'a') + value);
+    }
+    private static char toLowHexDigit(boolean ucase, int value) {
+        value = (value) & 0xf;
+        return (char) ((value < 10 ? '0' : ucase ? 'A' : 'a') + value);
+    }
+
     final void appendHex(boolean ucase, byte[] bytes, int fromIndex, int 
toIndex) {
         Preconditions.checkFromToIndex(fromIndex, toIndex, bytes.length, 
Preconditions.IOOBE_FORMATTER);

@@ -1832,17 +1841,14 @@ final void appendHex(boolean ucase, byte[] bytes, int 
fromIndex, int toIndex) {

         int charPos = count;
         for (int i = fromIndex; i < toIndex; ++i) {
-            short packed = HexDigits.digitPair(bytes[i], ucase);
+            byte b = bytes[i];
             if (isLatin1()) {
-                ByteArrayLittleEndian.setShort(
-                        value,
-                        charPos,
-                        packed);
+                StringLatin1.putChar(value, charPos++, toHighHexDigit(ucase, 
b));
+                StringLatin1.putChar(value, charPos++, toLowHexDigit(ucase, 
b));
             } else {
-                StringUTF16.putChar(value, charPos, packed & 0xff);
-                StringUTF16.putChar(value, charPos + 1, packed >> 8);
+                StringUTF16.putChar(value, charPos++, toHighHexDigit(ucase, 
b));
+                StringUTF16.putChar(value, charPos++, toLowHexDigit(ucase, b));
             }
-            charPos += 2;
         }

         count = charPos;

I don't think `appendHex` would make the cut as a public API. I think we would 
need something with much more general utility for that.

A more high-level idea would be to improve said `String` templates JEP to be 
able to format more generally into `Appendable`/`StringBuilder`, so that hex, 
octal - any custom format - can be efficiently appended to a `StringBuilder`. 

BTW, here's a patch on top of #15768 that simplifies `formatHex(Appendable..)` 
to simply append the result of `formatHex(byte[]...)`:


Benchmark                                              (size)  Mode  Cnt     
Score     Error   Units
HexFormatBench.appenderLowerCached                        512  avgt   15     
0,202 ±   0,001   us/op
HexFormatBench.formatLowerCached                          512  avgt   15     
0,189 ±   0,016   us/op


Same speed. Drawback is that the `appender` becomes allocating but short-lived 
allocations is one of my least concerns:

```java 
diff --git a/src/java.base/share/classes/java/util/HexFormat.java 
b/src/java.base/share/classes/java/util/HexFormat.java
index e1d88e3ceb6..68a15307fe0 100644
--- a/src/java.base/share/classes/java/util/HexFormat.java
+++ b/src/java.base/share/classes/java/util/HexFormat.java
@@ -407,38 +407,7 @@ public <A extends Appendable> A formatHex(A out, byte[] 
bytes, int fromIndex, in
         int length = toIndex - fromIndex;
         if (length > 0) {
             try {
-                boolean prefixEmpty = prefix.isEmpty();
-                boolean suffixEmpty = suffix.isEmpty();
-                boolean delimiterEmpty = delimiter.isEmpty();
-                if (!prefixEmpty) {
-                    out.append(prefix);
-                }
-                if (suffixEmpty && delimiterEmpty && prefixEmpty) {
-                    if (out instanceof StringBuilder sb) {
-                        jla.appendHex(sb, digitCase == Case.UPPERCASE, bytes, 
fromIndex, toIndex);
-                    } else {
-                        for (int i = 0; i < length; i++) {
-                            toHexDigits(out, bytes[fromIndex + i]);
-                        }
-                    }
-                } else {
-                    toHexDigits(out, bytes[fromIndex]);
-                    for (int i = 1; i < length; i++) {
-                        if (!suffixEmpty) {
-                            out.append(suffix);
-                        }
-                        if (!delimiterEmpty) {
-                            out.append(delimiter);
-                        }
-                        if (!prefixEmpty) {
-                            out.append(prefix);
-                        }
-                        toHexDigits(out, bytes[fromIndex + i]);
-                    }
-                }
-                if (!suffixEmpty) {
-                    out.append(suffix);
-                }
+                out.append(formatHex(bytes, fromIndex, toIndex));
             } catch (IOException ioe) {
                 throw new UncheckedIOException(ioe.getMessage(), ioe);
             }

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721672736
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1721851723
PR Comment: https://git.openjdk.org/jdk/pull/15768#issuecomment-1722048682

Re: RFR: 8316426: Optimization for HexFormat.formatHex

Reply via email to