As a point of interest, some investigation of updating StringJoiner for
CompactStrings was done a while back.
See https://bugs.openjdk.java.net/browse/JDK-8148937
-Brent
On 2/3/20 2:38 PM, Сергей Цыпанов wrote:
Hello,
as of JDK14 java.util.StringJoiner still uses char[] as a storage of glued
Strings.
This applies for the cases when all joined Strings as well as delimiter, prefix
and suffix contain only ASCII symbols.
As a result when StringJoiner.toString() is invoked, byte[] stored in String is
inflated in order to fill in char[] and
finally char[] is compressed when constructor of String is called:
String delimiter = this.delimiter;
char[] chars = new char[this.len + addLen];
int k = getChars(this.prefix, chars, 0);
if (size > 0) {
k += getChars(elts[0], chars, k); // inflate byte[] -> char[]
for(int i = 1; i < size; ++i) {
k += getChars(delimiter, chars, k);
k += getChars(elts[i], chars, k);
}
}
k += getChars(this.suffix, chars, k);
return new String(chars); // compress char[] -> byte[]
This can be improved by detecting cases when String.isLatin1() returns true for
all involved Strings.
I've prepared a patch along with benchmark proving that this change is correct
and brings improvement.
The only concern I have is about String.isLatin1(): as far as String belongs to
java.lang and StringJoiner to java.util
package-private String.isLatin1() cannot be directly accessed, we need to make
it public for successful compilation.
Another solution is to create an intermediate utility class located in
java.lang which delegates the call to String.isLatin1():
package java.lang;
public class StringHelper {
public static boolean isLatin1(String str) {
return str.isLatin1();
}
}
This allows to keep java.lang.String intact and have access to it's
package-private method outside of java.lang package.
Below I've added results of benchmarking for specified case (all Strings are
Latin1). The other case (at least one String is UTF-8) uses existing code so
there will be only a tiny regression due to several if-checks.
With best regards,
Sergey Tsypanov
(count) (length) Original
Patched Units
stringJoiner 1 1 26.7 ± 1.3
38.2 ± 1.1 ns/op
stringJoiner 1 5 27.4 ± 0.0
40.5 ± 2.2 ns/op
stringJoiner 1 10 29.6 ± 1.9
38.4 ± 1.9 ns/op
stringJoiner 1 100 61.1 ± 6.9
47.6 ± 0.6 ns/op
stringJoiner 5 1 91.1 ± 6.7
83.6 ± 2.0 ns/op
stringJoiner 5 5 96.1 ± 10.7
85.6 ± 1.1 ns/op
stringJoiner 5 10 105.5 ± 14.3
84.7 ± 1.1 ns/op
stringJoiner 5 100 266.6 ± 30.1
139.6 ± 14.0 ns/op
stringJoiner 10 1 190.7 ± 23.0
162.0 ± 2.9 ns/op
stringJoiner 10 5 200.0 ± 16.9
167.5 ± 11.0 ns/op
stringJoiner 10 10 216.4 ± 12.4
164.8 ± 1.7 ns/op
stringJoiner 10 100 545.3 ± 49.7
282.2 ± 12.0 ns/op
stringJoiner 100 1 1467.0 ± 90.3
1302.0 ± 18.5 ns/op
stringJoiner 100 5 1491.8 ± 166.2
1493.0 ± 135.4 ns/op
stringJoiner 100 10 1768.8 ± 160.6
1760.8 ± 111.4 ns/op
stringJoiner 100 100 3654.3 ± 113.1
3120.9 ± 175.9 ns/op
stringJoiner:·gc.alloc.rate.norm 1 1 120.0 ± 0.0
120.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 5 128.0 ± 0.0
120.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 10 144.0 ± 0.0
136.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 1 100 416.0 ± 0.0
312.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 1 144.0 ± 0.0
136.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 5 200.0 ± 0.0
168.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 10 272.0 ± 0.0
216.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 5 100 1632.0 ± 0.0
1128.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 1 256.0 ± 0.0
232.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 5 376.0 ± 0.0
312.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 10 520.0 ± 0.0
408.0 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 10 100 3224.1 ± 0.0
2216.1 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 1 1760.2 ± 14.9
1544.2 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 5 2960.3 ± 14.9
2344.2 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 10 4440.4 ± 0.0
3336.3 ± 0.0 B/op
stringJoiner:·gc.alloc.rate.norm 100 100 31449.3 ± 12.2
21346.7 ± 14.7 B/op