Re: RFR: 8282429: StringBuilder/StringBuffer.toString() skip compressing for UTF16 strings [v5]

ExE Boss Thu, 29 May 2025 04:44:42 -0700

On Wed, 23 Mar 2022 00:35:14 GMT, Xin Liu <x...@openjdk.org> wrote:

>> If AbstractStringBuilder only grow, the inflated value which has been 
>> encoded in UTF16 can't be compressed. 
>> toString() can skip compression in this case. This can save an 
>> ArrayAllocation in StringUTF16::compress().
>> 
>> java.io.BufferedRead::readLine() is a case that StringBuilder grows only. 
>> 
>> In microbench, we expect to see that allocation/op reduces 20%.  The initial 
>> capacity of StringBuilder is S in bytes. When it encounters the 1st 
>> character that can't be encoded in LATIN1, it inflates and allocate a new 
>> array of 2*S. `toString()` will try to compress that value so it need to 
>> allocate S bytes. The last step allocates 2*S bytes because it has to copy 
>> the string.  so it requires to allocate 5 * S bytes in total.  By skipping 
>> the failed compression, it only allocates 4 * S bytes.  that's 20%. In real 
>> execution, we observe 16% allocation reduction, tracked by JMH GC profiler 
>> `gc.alloc.rate.norm `.  I think it's because HotSpot can't track all 
>> allocations. 
>> 
>> Not only allocation drops, the runtime performance(ns/op) also increases 
>> from 3.34% to 18.91%. 
>> 
>> Before: 
>> 
>> $$make test 
>> TEST="micro:org.openjdk.bench.java.lang.StringBuilders.toStringWithMixedChars"
>>  MICRO="OPTIONS=-prof gc -gc true -o before.log -jvm 
>> $HOME/Devel/jdk_baseline/bin/java" 
>> 
>> Benchmark                                                               
>> (MIXED_SIZE)  Mode  Cnt     Score     Error   Units
>> StringBuilders.toStringWithMixedChars                                        
>>     128  avgt   15   649.846 ±  76.291   ns/op
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate                         
>>     128  avgt   15   872.855 ± 128.259  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.alloc.rate.norm                    
>>     128  avgt   15   880.121 ±   0.050    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space                
>>     128  avgt   15   707.730 ± 194.421  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Eden_Space.norm           
>>     128  avgt   15   706.602 ±  94.504    B/op
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space            
>>     128  avgt   15     0.001 ±   0.002  MB/sec
>> StringBuilders.toStringWithMixedChars:·gc.churn.G1_Survivor_Space.norm       
>>     128  avgt   15     0.001 ±   0.001    B/op
>> StringBuilders.toStringWithMixedChars:·gc.count                              
>>     128  avgt   15   113.000            counts
>> StringBuilders.toStringWithMixedChars:·gc.time                               
>>     ...
>
> Xin Liu has updated the pull request incrementally with one additional commit 
> since the last revision:
> 
>   rewording the comment of 'maybeLatin1'.


This introduced a bug where the following now incorrectly returns `false` with 
`‑XX:⁠+CompactStrings` as it produces an inflated `String` containing only 
Latin1 characters:

new StringBuilder()
        // insert Latin1 sub-slice "foo" of UTF-16 string:
        .insert(0, "foo \uF000", 0, 3)
        .toString()
        .equals("foo");


Such a `String` can cause many weird bugs.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/7671#issuecomment-2919134074

Re: RFR: 8282429: StringBuilder/StringBuffer.toString() skip compressing for UTF16 strings [v5]

Reply via email to