Hello,

as of JDK14 java.util.StringJoiner still uses char[] as a storage of glued 
Strings.

This applies for the cases when all joined Strings as well as delimiter, prefix 
and suffix contain only ASCII symbols.

As a result when StringJoiner.toString() is invoked, byte[] stored in String is 
inflated in order to fill in char[] and
finally char[] is compressed when constructor of String is called:

String delimiter = this.delimiter;
char[] chars = new char[this.len + addLen];
int k = getChars(this.prefix, chars, 0);
if (size > 0) {
    k += getChars(elts[0], chars, k);        // inflate byte[] -> char[]

    for(int i = 1; i < size; ++i) {
        k += getChars(delimiter, chars, k);
        k += getChars(elts[i], chars, k);
    }
}

k += getChars(this.suffix, chars, k);
return new String(chars);                    // compress char[] -> byte[]

This can be improved by detecting cases when String.isLatin1() returns true for 
all involved Strings.

I've prepared a patch along with benchmark proving that this change is correct 
and brings improvement.
The only concern I have is about String.isLatin1(): as far as String belongs to 
java.lang and StringJoiner to java.util
package-private String.isLatin1() cannot be directly accessed, we need to make 
it public for successful compilation.

Another solution is to create an intermediate utility class located in 
java.lang which delegates the call to String.isLatin1():

package java.lang;

public class StringHelper {
    public static boolean isLatin1(String str) {
        return str.isLatin1();
    }
}

This allows to keep java.lang.String intact and have access to it's 
package-private method outside of java.lang package.

Below I've added results of benchmarking for specified case (all Strings are 
Latin1). The other case (at least one String is UTF-8) uses existing code so 
there will be only a tiny regression due to several if-checks.

With best regards,
Sergey Tsypanov



                                          (count)  (length)         Original    
         Patched            Units
stringJoiner                                    1         1     26.7 ±   1.3    
    38.2 ±   1.1            ns/op
stringJoiner                                    1         5     27.4 ±   0.0    
    40.5 ±   2.2            ns/op
stringJoiner                                    1        10     29.6 ±   1.9    
    38.4 ±   1.9            ns/op
stringJoiner                                    1       100     61.1 ±   6.9    
    47.6 ±   0.6            ns/op
stringJoiner                                    5         1     91.1 ±   6.7    
    83.6 ±   2.0            ns/op
stringJoiner                                    5         5     96.1 ±  10.7    
    85.6 ±   1.1            ns/op
stringJoiner                                    5        10    105.5 ±  14.3    
    84.7 ±   1.1            ns/op
stringJoiner                                    5       100    266.6 ±  30.1    
   139.6 ±  14.0            ns/op
stringJoiner                                   10         1    190.7 ±  23.0    
   162.0 ±   2.9            ns/op
stringJoiner                                   10         5    200.0 ±  16.9    
   167.5 ±  11.0            ns/op
stringJoiner                                   10        10    216.4 ±  12.4    
   164.8 ±   1.7            ns/op
stringJoiner                                   10       100    545.3 ±  49.7    
   282.2 ±  12.0            ns/op
stringJoiner                                  100         1   1467.0 ±  90.3    
  1302.0 ±  18.5            ns/op
stringJoiner                                  100         5   1491.8 ± 166.2    
  1493.0 ± 135.4            ns/op
stringJoiner                                  100        10   1768.8 ± 160.6    
  1760.8 ± 111.4            ns/op
stringJoiner                                  100       100   3654.3 ± 113.1    
  3120.9 ± 175.9            ns/op

stringJoiner:·gc.alloc.rate.norm                1         1    120.0 ±   0.0    
   120.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1         5    128.0 ±   0.0    
   120.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1        10    144.0 ±   0.0    
   136.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                1       100    416.0 ±   0.0    
   312.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5         1    144.0 ±   0.0    
   136.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5         5    200.0 ±   0.0    
   168.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5        10    272.0 ±   0.0    
   216.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm                5       100   1632.0 ±   0.0    
  1128.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10         1    256.0 ±   0.0    
   232.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10         5    376.0 ±   0.0    
   312.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10        10    520.0 ±   0.0    
   408.0 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm               10       100   3224.1 ±   0.0    
  2216.1 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100         1   1760.2 ±  14.9    
  1544.2 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100         5   2960.3 ±  14.9    
  2344.2 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100        10   4440.4 ±   0.0    
  3336.3 ±   0.0             B/op
stringJoiner:·gc.alloc.rate.norm              100       100  31449.3 ±  12.2    
 21346.7 ±  14.7             B/op





diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java
--- a/src/java.base/share/classes/java/lang/String.java
+++ b/src/java.base/share/classes/java/lang/String.java
@@ -3699,7 +3699,11 @@
         return value;
     }
 
-    boolean isLatin1() {
+    /**
+     * Returns true in case all symbols belong to Latin1
+     * @return whether String has only ASCII symbols
+     */
+    public boolean isLatin1() {
         return COMPACT_STRINGS && coder == LATIN1;
     }
 
diff --git a/src/java.base/share/classes/java/util/StringJoiner.java b/src/java.base/share/classes/java/util/StringJoiner.java
--- a/src/java.base/share/classes/java/util/StringJoiner.java
+++ b/src/java.base/share/classes/java/util/StringJoiner.java
@@ -83,6 +83,8 @@
      */
     private String emptyValue;
 
+    private boolean allLatin1;
+
     /**
      * Constructs a {@code StringJoiner} with no characters in it, with no
      * {@code prefix} or {@code suffix}, and a copy of the supplied
@@ -125,6 +127,7 @@
         this.prefix = prefix.toString();
         this.delimiter = delimiter.toString();
         this.suffix = suffix.toString();
+        this.allLatin1 = this.prefix.isLatin1() && this.delimiter.isLatin1() && this.suffix.isLatin1();
     }
 
     /**
@@ -144,6 +147,7 @@
     public StringJoiner setEmptyValue(CharSequence emptyValue) {
         this.emptyValue = Objects.requireNonNull(emptyValue,
             "The empty value must not be null").toString();
+        this.allLatin1 &= this.emptyValue.isLatin1();
         return this;
     }
 
@@ -153,6 +157,13 @@
         return len;
     }
 
+    @SuppressWarnings("deprecation")
+    private static int getBytes(String s, byte[] bytes, int start) {
+        int len = s.length();
+        s.getBytes(0, len, bytes, start);
+        return len;
+    }
+
     /**
      * Returns the current value, consisting of the {@code prefix}, the values
      * added so far separated by the {@code delimiter}, and the {@code suffix},
@@ -173,6 +184,13 @@
             compactElts();
             return size == 0 ? "" : elts[0];
         }
+        if (allLatin1) {
+            return bytesToString(elts, size, addLen);
+        }
+        return charsToString(elts, size, addLen);
+    }
+
+    private String charsToString(String[] elts, int size, int addLen) {
         final String delimiter = this.delimiter;
         final char[] chars = new char[len + addLen];
         int k = getChars(prefix, chars, 0);
@@ -183,10 +201,25 @@
                 k += getChars(elts[i], chars, k);
             }
         }
-        k += getChars(suffix, chars, k);
+        getChars(suffix, chars, k);
         return new String(chars);
     }
 
+    private String bytesToString(String[] elts, int size, int addLen) {
+        final String delimiter = this.delimiter;
+        final byte[] bytes = new byte[len + addLen];
+        int k = getBytes(prefix, bytes, 0);
+        if (size > 0) {
+            k += getBytes(elts[0], bytes, k);
+            for (int i = 1; i < size; i++) {
+                k += getBytes(delimiter, bytes, k);
+                k += getBytes(elts[i], bytes, k);
+            }
+        }
+        getBytes(suffix, bytes, k);
+        return new String(bytes);
+    }
+
     /**
      * Adds a copy of the given {@code CharSequence} value as the next
      * element of the {@code StringJoiner} value. If {@code newElement} is
@@ -206,6 +239,7 @@
         }
         len += elt.length();
         elts[size++] = elt;
+        allLatin1 &= elt.isLatin1();
         return this;
     }
 
@@ -239,18 +273,38 @@
 
     private void compactElts() {
         if (size > 1) {
-            final char[] chars = new char[len];
-            int i = 1, k = getChars(elts[0], chars, 0);
-            do {
-                k += getChars(delimiter, chars, k);
-                k += getChars(elts[i], chars, k);
-                elts[i] = null;
-            } while (++i < size);
-            size = 1;
-            elts[0] = new String(chars);
+            if (allLatin1) {
+                compactBytes();
+            } else {
+                compactChars();
+            }
         }
     }
 
+    private void compactChars() {
+        final char[] chars = new char[len];
+        int i = 1, k = getChars(elts[0], chars, 0);
+        do {
+            k += getChars(delimiter, chars, k);
+            k += getChars(elts[i], chars, k);
+            elts[i] = null;
+        } while (++i < size);
+        size = 1;
+        elts[0] = new String(chars);
+    }
+
+    private void compactBytes() {
+        final byte[] bytes = new byte[len];
+        int i = 1, k = getBytes(elts[0], bytes, 0);
+        do {
+            k += getBytes(delimiter, bytes, k);
+            k += getBytes(elts[i], bytes, k);
+            elts[i] = null;
+        } while (++i < size);
+        size = 1;
+        elts[0] = new String(bytes);
+    }
+
     /**
      * Returns the length of the {@code String} representation
      * of this {@code StringJoiner}. Note that if

Reply via email to