As a point of interest, some investigation of updating StringJoiner for CompactStrings was done a while back.

See https://bugs.openjdk.java.net/browse/JDK-8148937

-Brent

On 2/3/20 2:38 PM, Сергей Цыпанов wrote:
Hello,

as of JDK14 java.util.StringJoiner still uses char[] as a storage of glued 
Strings.

This applies for the cases when all joined Strings as well as delimiter, prefix 
and suffix contain only ASCII symbols.

As a result when StringJoiner.toString() is invoked, byte[] stored in String is 
inflated in order to fill in char[] and
finally char[] is compressed when constructor of String is called:

String delimiter = this.delimiter;
char[] chars = new char[this.len + addLen];
int k = getChars(this.prefix, chars, 0);
if (size > 0) {
     k += getChars(elts[0], chars, k);        // inflate byte[] -> char[]

     for(int i = 1; i < size; ++i) {
         k += getChars(delimiter, chars, k);
         k += getChars(elts[i], chars, k);
     }
}

k += getChars(this.suffix, chars, k);
return new String(chars);                    // compress char[] -> byte[]

This can be improved by detecting cases when String.isLatin1() returns true for 
all involved Strings.

I've prepared a patch along with benchmark proving that this change is correct 
and brings improvement.
The only concern I have is about String.isLatin1(): as far as String belongs to 
java.lang and StringJoiner to java.util
package-private String.isLatin1() cannot be directly accessed, we need to make 
it public for successful compilation.

Another solution is to create an intermediate utility class located in 
java.lang which delegates the call to String.isLatin1():

package java.lang;

public class StringHelper {
     public static boolean isLatin1(String str) {
         return str.isLatin1();
     }
}

This allows to keep java.lang.String intact and have access to it's 
package-private method outside of java.lang package.

Below I've added results of benchmarking for specified case (all Strings are 
Latin1). The other case (at least one String is UTF-8) uses existing code so 
there will be only a tiny regression due to several if-checks.

With best regards,
Sergey Tsypanov



                                           (count)  (length)         Original   
          Patched            Units
stringJoiner                                    1         1     26.7 ±   1.3    
    38.2 ±   1.1            ns/op
stringJoiner                                    1         5     27.4 ±   0.0    
    40.5 ±   2.2            ns/op
stringJoiner                                    1        10     29.6 ±   1.9    
    38.4 ±   1.9            ns/op
stringJoiner                                    1       100     61.1 ±   6.9    
    47.6 ±   0.6            ns/op
stringJoiner                                    5         1     91.1 ±   6.7    
    83.6 ±   2.0            ns/op
stringJoiner                                    5         5     96.1 ±  10.7    
    85.6 ±   1.1            ns/op
stringJoiner                                    5        10    105.5 ±  14.3    
    84.7 ±   1.1            ns/op
stringJoiner                                    5       100    266.6 ±  30.1    
   139.6 ±  14.0            ns/op
stringJoiner                                   10         1    190.7 ±  23.0    
   162.0 ±   2.9            ns/op
stringJoiner                                   10         5    200.0 ±  16.9    
   167.5 ±  11.0            ns/op
stringJoiner                                   10        10    216.4 ±  12.4    
   164.8 ±   1.7            ns/op
stringJoiner                                   10       100    545.3 ±  49.7    
   282.2 ±  12.0            ns/op
stringJoiner                                  100         1   1467.0 ±  90.3    
  1302.0 ±  18.5            ns/op
stringJoiner                                  100         5   1491.8 ± 166.2    
  1493.0 ± 135.4            ns/op
stringJoiner                                  100        10   1768.8 ± 160.6    
  1760.8 ± 111.4            ns/op
stringJoiner                                  100       100   3654.3 ± 113.1    
  3120.9 ± 175.9            ns/op

stringJoiner:·gc.alloc.rate.norm                1         1    120.0 ±   0.0    
   120.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1         5    128.0 ±   0.0    
   120.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1        10    144.0 ±   0.0    
   136.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1       100    416.0 ±   0.0    
   312.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5         1    144.0 ±   0.0    
   136.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5         5    200.0 ±   0.0    
   168.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5        10    272.0 ±   0.0    
   216.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5       100   1632.0 ±   0.0    
  1128.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10         1    256.0 ±   0.0    
   232.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10         5    376.0 ±   0.0    
   312.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10        10    520.0 ±   0.0    
   408.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10       100   3224.1 ±   0.0    
  2216.1 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100         1   1760.2 ±  14.9    
  1544.2 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100         5   2960.3 ±  14.9    
  2344.2 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100        10   4440.4 ±   0.0    
  3336.3 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100       100  31449.3 ±  12.2    
 21346.7 ±  14.7             B/op





Reply via email to