On Fri, 15 Jan 2021 19:11:38 GMT, Naoto Sato <na...@openjdk.org> wrote:
>> The `StringCoding.resultCached` mechanism is used to remove the allocation >> of a `StringCoding.Result` object on potentially hot paths used in some >> `String` constructors. Using a `ThreadLocal` has overheads though, and the >> overhead got a bit worse after JDK-8258596 which addresses a leak by adding >> a `SoftReference`. >> >> This patch refactors much of the decode logic back into `String` and gets >> rid of not only the `Result` cache, but the `Result` class itself along with >> the `StringDecoder` class and cache. >> >> Microbenchmark results: >> Baseline >> >> Benchmark (charsetName) Mode Cnt >> Score Error Units >> decodeCharset US-ASCII avgt 15 >> 193.043 ± 8.207 ns/op >> decodeCharset:·gc.alloc.rate.norm US-ASCII avgt 15 >> 112.009 ± 0.001 B/op >> decodeCharset ISO-8859-1 avgt 15 >> 164.580 ± 6.514 ns/op >> decodeCharset:·gc.alloc.rate.norm ISO-8859-1 avgt 15 >> 112.009 ± 0.001 B/op >> decodeCharset UTF-8 avgt 15 >> 328.370 ± 18.420 ns/op >> decodeCharset:·gc.alloc.rate.norm UTF-8 avgt 15 >> 224.019 ± 0.002 B/op >> decodeCharset MS932 avgt 15 >> 328.870 ± 20.056 ns/op >> decodeCharset:·gc.alloc.rate.norm MS932 avgt 15 >> 232.020 ± 0.002 B/op >> decodeCharset ISO-8859-6 avgt 15 >> 193.603 ± 12.305 ns/op >> decodeCharset:·gc.alloc.rate.norm ISO-8859-6 avgt 15 >> 112.010 ± 0.001 B/op >> decodeCharsetName US-ASCII avgt 15 >> 209.454 ± 9.040 ns/op >> decodeCharsetName:·gc.alloc.rate.norm US-ASCII avgt 15 >> 112.009 ± 0.001 B/op >> decodeCharsetName ISO-8859-1 avgt 15 >> 188.234 ± 7.533 ns/op >> decodeCharsetName:·gc.alloc.rate.norm ISO-8859-1 avgt 15 >> 112.009 ± 0.001 B/op >> decodeCharsetName UTF-8 avgt 15 >> 399.463 ± 12.437 ns/op >> decodeCharsetName:·gc.alloc.rate.norm UTF-8 avgt 15 >> 224.019 ± 0.003 B/op >> decodeCharsetName MS932 avgt 15 >> 358.839 ± 15.385 ns/op >> decodeCharsetName:·gc.alloc.rate.norm MS932 avgt 15 >> 208.017 ± 0.003 B/op >> decodeCharsetName ISO-8859-6 avgt 15 >> 162.570 ± 7.090 ns/op >> decodeCharsetName:·gc.alloc.rate.norm ISO-8859-6 avgt 15 >> 112.009 ± 0.001 B/op >> decodeDefault N/A avgt 15 >> 316.081 ± 11.101 ns/op >> decodeDefault:·gc.alloc.rate.norm N/A avgt 15 >> 224.019 ± 0.002 B/op >> >> Patched: >> Benchmark (charsetName) Mode Cnt >> Score Error Units >> decodeCharset US-ASCII avgt 15 >> 159.153 ± 6.082 ns/op >> decodeCharset:·gc.alloc.rate.norm US-ASCII avgt 15 >> 112.009 ± 0.001 B/op >> decodeCharset ISO-8859-1 avgt 15 >> 134.433 ± 6.203 ns/op >> decodeCharset:·gc.alloc.rate.norm ISO-8859-1 avgt 15 >> 112.009 ± 0.001 B/op >> decodeCharset UTF-8 avgt 15 >> 297.234 ± 21.654 ns/op >> decodeCharset:·gc.alloc.rate.norm UTF-8 avgt 15 >> 224.019 ± 0.002 B/op >> decodeCharset MS932 avgt 15 >> 311.583 ± 16.445 ns/op >> decodeCharset:·gc.alloc.rate.norm MS932 avgt 15 >> 208.018 ± 0.002 B/op >> decodeCharset ISO-8859-6 avgt 15 >> 156.216 ± 6.522 ns/op >> decodeCharset:·gc.alloc.rate.norm ISO-8859-6 avgt 15 >> 112.010 ± 0.001 B/op >> decodeCharsetName US-ASCII avgt 15 >> 180.752 ± 9.411 ns/op >> decodeCharsetName:·gc.alloc.rate.norm US-ASCII avgt 15 >> 112.010 ± 0.001 B/op >> decodeCharsetName ISO-8859-1 avgt 15 >> 156.170 ± 8.003 ns/op >> decodeCharsetName:·gc.alloc.rate.norm ISO-8859-1 avgt 15 >> 112.010 ± 0.001 B/op >> decodeCharsetName UTF-8 avgt 15 >> 370.337 ± 22.199 ns/op >> decodeCharsetName:·gc.alloc.rate.norm UTF-8 avgt 15 >> 224.019 ± 0.001 B/op >> decodeCharsetName MS932 avgt 15 >> 312.589 ± 15.067 ns/op >> decodeCharsetName:·gc.alloc.rate.norm MS932 avgt 15 >> 208.018 ± 0.002 B/op >> decodeCharsetName ISO-8859-6 avgt 15 >> 173.205 ± 9.647 ns/op >> decodeCharsetName:·gc.alloc.rate.norm ISO-8859-6 avgt 15 >> 112.009 ± 0.001 B/op >> decodeDefault N/A avgt 15 >> 303.459 ± 16.452 ns/op >> decodeDefault:·gc.alloc.rate.norm N/A avgt 15 >> 224.019 ± 0.001 B/op >> >> Most variants improve. There's a small added overhead in `String >> charsetName` variants for some charsets such as `ISO-8859-6` that benefited >> slightly from the `StringDecoder` cache due avoiding a lookup, but most >> variants are not helped by this cache and instead see a significant gain >> from skipping that step. `Charset` variants don't need a lookup and improve >> across the board. >> >> Another drawback is that we need to cram more logic into `String` to bypass >> the `StringCoding.Result` indirection - but getting rid of two commonly used >> `ThreadLocal` caches and most cases getting a bit better raw throughput in >> the process I think more than enough makes up for that. >> >> Testing: tier1-4 > > src/java.base/share/classes/java/lang/String.java line 544: > >> 542: return; >> 543: } >> 544: if (charset == UTF_8) { > > The constructor is getting big. Might be better to keep the original private > methods (decodeASCII/Latin1/UTF8) for readability. Since we're calculating two final values, that split was what necessitated a `Result` object. Until valhalla I don't think there's a way to get rid of the performance cost here without putting the bulk of the logic into the constructor. Refactoring out some of the logic to utility methods could be a performance neutral way to cut down the complexity, though. E.g.: char c = (char)((b1 << 12) ^ (b2 << 6) ^ (b3 ^ (((byte) 0xE0 << 12) ^ ((byte) 0x80 << 6) ^ ((byte) 0x80 << 0)))); if (Character.isSurrogate(c)) { putChar(dst, dp++, REPL); } else { putChar(dst, dp++, c); } could be reasonably factored out and reduced to something like: putChar(dst, dp++, StringCoding.decode3(b1, b2, b3)); I've refrained from refurbishing too much, though. ------------- PR: https://git.openjdk.java.net/jdk/pull/2102