On Fri, 15 Jan 2021 19:11:38 GMT, Naoto Sato <na...@openjdk.org> wrote:

>> The `StringCoding.resultCached` mechanism is used to remove the allocation 
>> of a `StringCoding.Result` object on potentially hot paths used in some 
>> `String` constructors. Using a `ThreadLocal` has overheads though, and the 
>> overhead got a bit worse after JDK-8258596 which addresses a leak by adding 
>> a `SoftReference`.
>> 
>> This patch refactors much of the decode logic back into `String` and gets 
>> rid of not only the `Result` cache, but the `Result` class itself along with 
>> the `StringDecoder` class and cache.
>> 
>> Microbenchmark results:
>> Baseline
>> 
>> Benchmark                                           (charsetName)  Mode  Cnt 
>>    Score    Error   Units
>> decodeCharset                                            US-ASCII  avgt   15 
>>  193.043 ±  8.207   ns/op
>> decodeCharset:·gc.alloc.rate.norm                        US-ASCII  avgt   15 
>>  112.009 ±  0.001    B/op
>> decodeCharset                                          ISO-8859-1  avgt   15 
>>  164.580 ±  6.514   ns/op
>> decodeCharset:·gc.alloc.rate.norm                      ISO-8859-1  avgt   15 
>>  112.009 ±  0.001    B/op
>> decodeCharset                                               UTF-8  avgt   15 
>>  328.370 ± 18.420   ns/op
>> decodeCharset:·gc.alloc.rate.norm                           UTF-8  avgt   15 
>>  224.019 ±  0.002    B/op
>> decodeCharset                                               MS932  avgt   15 
>>  328.870 ± 20.056   ns/op
>> decodeCharset:·gc.alloc.rate.norm                           MS932  avgt   15 
>>  232.020 ±  0.002    B/op
>> decodeCharset                                          ISO-8859-6  avgt   15 
>>  193.603 ± 12.305   ns/op
>> decodeCharset:·gc.alloc.rate.norm                      ISO-8859-6  avgt   15 
>>  112.010 ±  0.001    B/op
>> decodeCharsetName                                        US-ASCII  avgt   15 
>>  209.454 ±  9.040   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                    US-ASCII  avgt   15 
>>  112.009 ±  0.001    B/op
>> decodeCharsetName                                      ISO-8859-1  avgt   15 
>>  188.234 ±  7.533   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-1  avgt   15 
>>  112.009 ±  0.001    B/op
>> decodeCharsetName                                           UTF-8  avgt   15 
>>  399.463 ± 12.437   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                       UTF-8  avgt   15 
>>  224.019 ±  0.003    B/op
>> decodeCharsetName                                           MS932  avgt   15 
>>  358.839 ± 15.385   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                       MS932  avgt   15 
>>  208.017 ±  0.003    B/op
>> decodeCharsetName                                      ISO-8859-6  avgt   15 
>>  162.570 ±  7.090   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-6  avgt   15 
>>  112.009 ±  0.001    B/op
>> decodeDefault                                                 N/A  avgt   15 
>>  316.081 ± 11.101   ns/op
>> decodeDefault:·gc.alloc.rate.norm                             N/A  avgt   15 
>>  224.019 ±  0.002    B/op
>> 
>> Patched:
>> Benchmark                                           (charsetName)  Mode  Cnt 
>>    Score    Error   Units
>> decodeCharset                                            US-ASCII  avgt   15 
>>  159.153 ±  6.082   ns/op
>> decodeCharset:·gc.alloc.rate.norm                        US-ASCII  avgt   15 
>>  112.009 ±  0.001    B/op
>> decodeCharset                                          ISO-8859-1  avgt   15 
>>  134.433 ±  6.203   ns/op
>> decodeCharset:·gc.alloc.rate.norm                      ISO-8859-1  avgt   15 
>>  112.009 ±  0.001    B/op
>> decodeCharset                                               UTF-8  avgt   15 
>>  297.234 ± 21.654   ns/op
>> decodeCharset:·gc.alloc.rate.norm                           UTF-8  avgt   15 
>>  224.019 ±  0.002    B/op
>> decodeCharset                                               MS932  avgt   15 
>>  311.583 ± 16.445   ns/op
>> decodeCharset:·gc.alloc.rate.norm                           MS932  avgt   15 
>>  208.018 ±  0.002    B/op
>> decodeCharset                                          ISO-8859-6  avgt   15 
>>  156.216 ±  6.522   ns/op
>> decodeCharset:·gc.alloc.rate.norm                      ISO-8859-6  avgt   15 
>>  112.010 ±  0.001    B/op
>> decodeCharsetName                                        US-ASCII  avgt   15 
>>  180.752 ±  9.411   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                    US-ASCII  avgt   15 
>>  112.010 ±  0.001    B/op
>> decodeCharsetName                                      ISO-8859-1  avgt   15 
>>  156.170 ±  8.003   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-1  avgt   15 
>>  112.010 ±  0.001    B/op
>> decodeCharsetName                                           UTF-8  avgt   15 
>>  370.337 ± 22.199   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                       UTF-8  avgt   15 
>>  224.019 ±  0.001    B/op
>> decodeCharsetName                                           MS932  avgt   15 
>>  312.589 ± 15.067   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                       MS932  avgt   15 
>>  208.018 ±  0.002    B/op
>> decodeCharsetName                                      ISO-8859-6  avgt   15 
>>  173.205 ±  9.647   ns/op
>> decodeCharsetName:·gc.alloc.rate.norm                  ISO-8859-6  avgt   15 
>>  112.009 ±  0.001    B/op
>> decodeDefault                                                 N/A  avgt   15 
>>  303.459 ± 16.452   ns/op
>> decodeDefault:·gc.alloc.rate.norm                             N/A  avgt   15 
>>  224.019 ±  0.001    B/op
>> 
>> Most variants improve. There's a small added overhead in `String 
>> charsetName` variants for some charsets such as `ISO-8859-6` that benefited 
>> slightly from the `StringDecoder` cache due avoiding a lookup, but most 
>> variants are not helped by this cache and instead see a significant gain 
>> from skipping that step. `Charset` variants don't need a lookup and improve 
>> across the board.
>> 
>> Another drawback is that we need to cram more logic into `String` to bypass 
>> the `StringCoding.Result` indirection - but getting rid of two commonly used 
>> `ThreadLocal` caches and most cases getting a bit better raw throughput in 
>> the process I think more than enough makes up for that.
>> 
>> Testing: tier1-4
>
> src/java.base/share/classes/java/lang/String.java line 544:
> 
>> 542:             return;
>> 543:         }
>> 544:         if (charset == UTF_8) {
> 
> The constructor is getting big. Might be better to keep the original private 
> methods (decodeASCII/Latin1/UTF8) for readability.

Since we're calculating two final values, that split was what necessitated a 
`Result` object. Until valhalla I don't think there's a way to get rid of the 
performance cost here without putting the bulk of the logic into the 
constructor.

Refactoring out some of the logic to utility methods could be a performance 
neutral way to cut down the complexity, though. E.g.:
                                 char c = (char)((b1 << 12) ^
                                                (b2 <<  6) ^
                                                (b3 ^
                                                 (((byte) 0xE0 << 12) ^
                                                  ((byte) 0x80 <<  6) ^
                                                  ((byte) 0x80 <<  0))));
                                if (Character.isSurrogate(c)) {
                                    putChar(dst, dp++, REPL);
                                } else {
                                    putChar(dst, dp++, c);
                                }
could be reasonably factored out and reduced to something like:
                                putChar(dst, dp++, StringCoding.decode3(b1, b2, 
b3));
I've refrained from refurbishing too much, though.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2102

Reply via email to