Hi Sherman,

On 2017-12-09 00:09, Xueming Shen wrote:
Hi,

Please help review the changes for j.u.z.ZipCoder/JDK-8184947 (which also includes cleanup/improvement work in java.lang.StringCoding.java to speed up general String
coding performance, especially for UTF8).

issue: https://bugs.openjdk.java.net/browse/JDK-8184947
webrev: http://cr.openjdk.java.net/~sherman/8184947/webrev

I've not fully reviewed this yet, but something struck me halfway through: As the ASCII fast-path is what's really important here, we could write that part without ever having
to go via a StringCoding.Result.

On four of your ZipCodingBM micros this improves performance a bit further (~10%):

diff -r 848591d85052 src/java.base/share/classes/java/lang/StringCoding.java
--- a/src/java.base/share/classes/java/lang/StringCoding.java    Sun Dec 10 18:48:21 2017 +0100 +++ b/src/java.base/share/classes/java/lang/StringCoding.java    Sun Dec 10 18:55:38 2017 +0100
@@ -937,7 +937,13 @@
      * Throws iae, instead of replacing, if malformed or unmmappble.
      */
     static String newStringUTF8NoRepl(byte[] bytes, int off, int len) {
-        Result ret = decodeUTF8(bytes, off, len, false);
+        if (COMPACT_STRINGS && !hasNegatives(bytes, off, len)) {
+            return new String(Arrays.copyOfRange(bytes, off, off + len), LATIN1);
+        }
+        Result ret = decodeUTF8_0(bytes, off, len, false);
         return new String(ret.value, ret.coder);
     }

Benchmark                Mode  Cnt    Score   Error  Units
ZipCodingBM.jf_entries   avgt   25   43.682 ± 0.656  us/op
ZipCodingBM.jf_stream    avgt   25   42.075 ± 0.444  us/op
ZipCodingBM.zf_entries   avgt   25   43.323 ± 0.572  us/op
ZipCodingBM.zf_stream    avgt   25   40.237 ± 0.604  us/op

After:
Benchmark                Mode  Cnt    Score   Error  Units
ZipCodingBM.jf_entries   avgt   25   37.551 ± 1.198  us/op
ZipCodingBM.jf_stream    avgt   25   38.065 ± 0.628  us/op
ZipCodingBM.zf_entries   avgt   25   37.595 ± 0.686  us/op
ZipCodingBM.zf_stream    avgt   25   35.734 ± 0.442  us/op

(I don't know which jar you using as test.jar, but results seems consistent across a
few different ones)

The gain is achieved by not going via the ThreadLocal<StringCoding.Result> resultCache,
which checks out when inspecting the perfasm output.

I'm a bit skeptical of ThreadLocal caching optimizations for such small objects (StringCoding.Result), and wonder if there's something else we can do to help the
optimizer out here, possibly eliminating the allocation entirely.

Thanks!

/Claes

Reply via email to