On Wed, 23 Mar 2022 00:35:14 GMT, Xin Liu <x...@openjdk.org> wrote: >> If AbstractStringBuilder only grow, the inflated value which has been >> encoded in UTF16 can't be compressed. >> toString() can skip compression in this case. This can save an >> ArrayAllocation in StringUTF16::compress(). >> >> java.io.BufferedRead::readLine() is a case that StringBuilder grows only. >> >> In microbench, we expect to see that allocation/op reduces 20%. The initial >> capacity of StringBuilder is S in bytes. When it encounters the 1st >> character that can't be encoded in LATIN1, it inflates and allocate a new >> array of 2*S. `toString()` will try to compress that value so it need to >> allocate S bytes. The last step allocates 2*S bytes because it has to copy >> the string. so it requires to allocate 5 * S bytes in total. By skipping >> the failed compression, it only allocates 4 * S bytes. that's 20%. In real >> execution, we observe 16% allocation reduction, tracked by JMH GC profiler >> `gc.alloc.rate.norm `. I think it's because HotSpot can't track all >> allocations. >> >> Not only allocation drops, the runtime performance(ns/op) also increases >> from 3.34% to 18.91%. >> >> Before: >> >> $$make test >> TEST="micro:org.openjdk.bench.java.lang.StringBuilders.toStringWithMixedChars" >> MICRO="OPTIONS=-prof gc -gc true -o before.log -jvm >> $HOME/Devel/jdk_baseline/bin/java" >> >> Benchmark >> (MIXED_SIZE) Mode Cnt Score Error Units >> StringBuilders.toStringWithMixedChars >> 128 avgt 15 649.846 ± 76.291 ns/op >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate >> 128 avgt 15 872.855 ± 128.259 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm >> 128 avgt 15 880.121 ± 0.050 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space >> 128 avgt 15 707.730 ± 194.421 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm >> 128 avgt 15 706.602 ± 94.504 B/op >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space >> 128 avgt 15 0.001 ± 0.002 MB/sec >> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm >> 128 avgt 15 0.001 ± 0.001 B/op >> StringBuilders.toStringWithMixedChars:·gc.count >> 128 avgt 15 113.000 counts >> StringBuilders.toStringWithMixedChars:·gc.time >> ... > > Xin Liu has updated the pull request incrementally with one additional commit > since the last revision: > > rewording the comment of 'maybeLatin1'.
This introduced a bug where the following now incorrectly returns `false` with `‑XX:+CompactStrings` as it produces an inflated `String` containing only Latin1 characters: new StringBuilder() // insert Latin1 sub-slice "foo" of UTF-16 string: .insert(0, "foo \uF000", 0, 3) .toString() .equals("foo"); Such a `String` can cause many weird bugs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/7671#issuecomment-2919134074