On Thu, 21 Jan 2021 20:48:33 GMT, Claes Redestad <[email protected]> wrote:
>> The `StringCoding.resultCached` mechanism is used to remove the allocation
>> of a `StringCoding.Result` object on potentially hot paths used in some
>> `String` constructors. Using a `ThreadLocal` has overheads though, and the
>> overhead got a bit worse after JDK-8258596 which addresses a leak by adding
>> a `SoftReference`.
>>
>> This patch refactors much of the decode logic back into `String` and gets
>> rid of not only the `Result` cache, but the `Result` class itself along with
>> the `StringDecoder` class and cache.
>>
>> Microbenchmark results:
>> Baseline
>>
>> Benchmark (charsetName) Mode Cnt
>> Score Error Units
>> decodeCharset US-ASCII avgt 15
>> 193.043 ± 8.207 ns/op
>> decodeCharset:·gc.alloc.rate.norm US-ASCII avgt 15
>> 112.009 ± 0.001 B/op
>> decodeCharset ISO-8859-1 avgt 15
>> 164.580 ± 6.514 ns/op
>> decodeCharset:·gc.alloc.rate.norm ISO-8859-1 avgt 15
>> 112.009 ± 0.001 B/op
>> decodeCharset UTF-8 avgt 15
>> 328.370 ± 18.420 ns/op
>> decodeCharset:·gc.alloc.rate.norm UTF-8 avgt 15
>> 224.019 ± 0.002 B/op
>> decodeCharset MS932 avgt 15
>> 328.870 ± 20.056 ns/op
>> decodeCharset:·gc.alloc.rate.norm MS932 avgt 15
>> 232.020 ± 0.002 B/op
>> decodeCharset ISO-8859-6 avgt 15
>> 193.603 ± 12.305 ns/op
>> decodeCharset:·gc.alloc.rate.norm ISO-8859-6 avgt 15
>> 112.010 ± 0.001 B/op
>> decodeCharsetName US-ASCII avgt 15
>> 209.454 ± 9.040 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm US-ASCII avgt 15
>> 112.009 ± 0.001 B/op
>> decodeCharsetName ISO-8859-1 avgt 15
>> 188.234 ± 7.533 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm ISO-8859-1 avgt 15
>> 112.009 ± 0.001 B/op
>> decodeCharsetName UTF-8 avgt 15
>> 399.463 ± 12.437 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm UTF-8 avgt 15
>> 224.019 ± 0.003 B/op
>> decodeCharsetName MS932 avgt 15
>> 358.839 ± 15.385 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm MS932 avgt 15
>> 208.017 ± 0.003 B/op
>> decodeCharsetName ISO-8859-6 avgt 15
>> 162.570 ± 7.090 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm ISO-8859-6 avgt 15
>> 112.009 ± 0.001 B/op
>> decodeDefault N/A avgt 15
>> 316.081 ± 11.101 ns/op
>> decodeDefault:·gc.alloc.rate.norm N/A avgt 15
>> 224.019 ± 0.002 B/op
>>
>> Patched:
>> Benchmark (charsetName) Mode Cnt
>> Score Error Units
>> decodeCharset US-ASCII avgt 15
>> 159.153 ± 6.082 ns/op
>> decodeCharset:·gc.alloc.rate.norm US-ASCII avgt 15
>> 112.009 ± 0.001 B/op
>> decodeCharset ISO-8859-1 avgt 15
>> 134.433 ± 6.203 ns/op
>> decodeCharset:·gc.alloc.rate.norm ISO-8859-1 avgt 15
>> 112.009 ± 0.001 B/op
>> decodeCharset UTF-8 avgt 15
>> 297.234 ± 21.654 ns/op
>> decodeCharset:·gc.alloc.rate.norm UTF-8 avgt 15
>> 224.019 ± 0.002 B/op
>> decodeCharset MS932 avgt 15
>> 311.583 ± 16.445 ns/op
>> decodeCharset:·gc.alloc.rate.norm MS932 avgt 15
>> 208.018 ± 0.002 B/op
>> decodeCharset ISO-8859-6 avgt 15
>> 156.216 ± 6.522 ns/op
>> decodeCharset:·gc.alloc.rate.norm ISO-8859-6 avgt 15
>> 112.010 ± 0.001 B/op
>> decodeCharsetName US-ASCII avgt 15
>> 180.752 ± 9.411 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm US-ASCII avgt 15
>> 112.010 ± 0.001 B/op
>> decodeCharsetName ISO-8859-1 avgt 15
>> 156.170 ± 8.003 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm ISO-8859-1 avgt 15
>> 112.010 ± 0.001 B/op
>> decodeCharsetName UTF-8 avgt 15
>> 370.337 ± 22.199 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm UTF-8 avgt 15
>> 224.019 ± 0.001 B/op
>> decodeCharsetName MS932 avgt 15
>> 312.589 ± 15.067 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm MS932 avgt 15
>> 208.018 ± 0.002 B/op
>> decodeCharsetName ISO-8859-6 avgt 15
>> 173.205 ± 9.647 ns/op
>> decodeCharsetName:·gc.alloc.rate.norm ISO-8859-6 avgt 15
>> 112.009 ± 0.001 B/op
>> decodeDefault N/A avgt 15
>> 303.459 ± 16.452 ns/op
>> decodeDefault:·gc.alloc.rate.norm N/A avgt 15
>> 224.019 ± 0.001 B/op
>>
>> Most variants improve. There's a small added overhead in `String
>> charsetName` variants for some charsets such as `ISO-8859-6` that benefited
>> slightly from the `StringDecoder` cache due avoiding a lookup, but most
>> variants are not helped by this cache and instead see a significant gain
>> from skipping that step. `Charset` variants don't need a lookup and improve
>> across the board.
>>
>> Another drawback is that we need to cram more logic into `String` to bypass
>> the `StringCoding.Result` indirection - but getting rid of two commonly used
>> `ThreadLocal` caches and most cases getting a bit better raw throughput in
>> the process I think more than enough makes up for that.
>>
>> Testing: tier1-4
>
> Claes Redestad has updated the pull request incrementally with one additional
> commit since the last revision:
>
> Reduce code duplication in getBytes/getBytesNoRepl
This looks very good. Thanks for the extra refactoring and consolidating of
code.
src/java.base/share/classes/java/lang/StringCoding.java line 42:
> 40: static final Charset ISO_8859_1 = sun.nio.cs.ISO_8859_1.INSTANCE;
> 41: static final Charset US_ASCII = sun.nio.cs.US_ASCII.INSTANCE;
> 42: static final Charset UTF_8 = sun.nio.cs.UTF_8.INSTANCE;
These could move to String also, if there is a benefit them being local.
Otherwise, the uses in String could refer directly to the INSTANCEs in the
sun.nio.cs... classes.
src/java.base/share/classes/java/lang/StringCoding.java line 49:
> 47: * @param msg message to print
> 48: */
> 49: private static native void err(String msg);
A separate cleanup could remove this unused method and the corresponding native
code in StringCoding.c.
src/java.base/share/classes/java/lang/StringCoding.java line 64:
> 62: public static int implEncodeISOArray(byte[] sa, int sp,
> 63: byte[] da, int dp, int len) {
> 64: int i = 0;
As a separate cleanup...
If there isn't any value to these intrinsics being in StringCoder, they could
also move to String
with the corresponding intrinsic references.
-------------
Marked as reviewed by rriggs (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/2102