Hello,

I have an enhancement proposal for some cases of String concatenation in Java.

Currently we concat Strings mostly using java.lang.StringBuilder. The main 
disadvantage of StringBuilder is underlying char array or rather a need to 
resize it when the capacity is about to exceed array length and subsequent 
copying of array content into newly allocated array.

One alternative solution existing is StringJoiner. Before JDK 9 it was a kind 
of decorator over StringBuilder, but later it was reworked in order to store 
appended Strings into String[] and overall capacity accumulated into int field. 
This makes it possible to allocate char[] only once and of exact size in 
toString() method reducing allocation cost.

My proposal is to copy-paste the code of StringJoinder into newly created class 
java.util.StringChain, drop the code responsible for delimiter, prefix and 
suffix and use it instead of StringBuilder in common StringBuilder::append 
concatenation pattern.

Possible use-cases for proposed code are:
- plain String concatenation
- String::chain (new methods)
- Stream.collect(Collectors.joining())
- StringConcatFactory

We can create new methods String.chain(Iterable<CharSequence>) and 
String.chain(CharSequence...) which allow to encapsulate boilerplate code like


  StringBuilder sb = new StringBuilder();
  for (CharSequence cs : charSequences) {
    sb.append(cs);
  }
  String result = sb.toString():


into one line:


  String result = String.chain(charSequences);



As of performance I've done some measurements using JMH on my work machine 
(Intel i7-7700) for both Latin and non-Latin Strings of different size and 
count.
Here are the results:

https://github.com/stsypanov/string-chain/blob/master/results/StringBuilderVsStringChainBenchmark.txt

There is a few corner cases (e.g. 1000 Strings of length 1 appended) when 
StringBuilder takes over StringChain constructed with default capacity of 8, 
but StringChain constructed with exact added Strings count almost always wins, 
especially when dealing with non-Latin chars (Russian in my case).

I've also created a separate repo on GitHub with benchmarks:

https://github.com/stsypanov/string-chain

Key feature here is ability to allocate String array of exact size is cases we 
know added elements count.
Thus I think that if the change will be accepted we can add also an overloaded 
method String.chain(Collection<CharSequence>) as Collection::size allows to 
contruct StringChain of exact size.

Patch is attached.

Kind regards,
Sergei Tsypanov
diff --git a/src/java.base/share/classes/java/lang/String.java b/src/java.base/share/classes/java/lang/String.java
--- a/src/java.base/share/classes/java/lang/String.java
+++ b/src/java.base/share/classes/java/lang/String.java
@@ -25,6 +25,7 @@
 
 package java.lang;
 
+import java.lang.StringChain;
 import java.io.ObjectStreamField;
 import java.io.UnsupportedEncodingException;
 import java.lang.annotation.Native;
@@ -2455,6 +2456,51 @@
     }
 
     /**
+     * Returns a new {@code String} composed of copies of the
+     * {@code CharSequence elements} concatenated together
+     *
+     * @param elements an {@code Iterable} that will have its {@code elements}
+     *                 concatenated.
+     *
+     * @return a new {@code String} that is concatenated from the {@code elements}
+     *                 argument
+     *
+     * @throws NullPointerException if {@code elements} is {@code null}
+     *
+     * @since 13
+     */
+    public static String chain(Iterable<? extends CharSequence> elements) {
+        Objects.requireNonNull(elements);
+        StringChain chain = new StringChain();
+        for (CharSequence element : elements) {
+            chain.add(element);
+        }
+        return chain.toString();
+    }
+
+    /**
+     * Returns a new {@code String} composed of copies of the
+     * {@code CharSequence elements} concatenated together
+     *
+     * @param elements the elements to concatenate.
+     *
+     * @return a new {@code String} that is concatenated from the {@code elements}
+     *                 argument
+     *
+     * @throws NullPointerException if {@code elements} is {@code null}
+     *
+     * @since 13
+     */
+    public static String chain(CharSequence... elements) {
+        Objects.requireNonNull(elements);
+        StringChain chain = new StringChain(elements.length);
+        for (CharSequence element : elements) {
+            chain.add(element);
+        }
+        return chain.toString();
+    }
+
+    /**
      * Converts all of the characters in this {@code String} to lower
      * case using the rules of the given {@code Locale}.  Case mapping is based
      * on the Unicode Standard version specified by the {@link java.lang.Character Character}
diff --git a/src/java.base/share/classes/java/lang/StringChain.java b/src/java.base/share/classes/java/lang/StringChain.java
new file mode 100644
--- /dev/null
+++ b/src/java.base/share/classes/java/lang/StringChain.java
@@ -0,0 +1,157 @@
+package java.base.share.classes.java.lang;
+
+import java.util.Arrays;
+import java.util.Objects;
+
+public class StringChain {
+
+    /**
+     * Contains all the string components added so far.
+     */
+    private String[] elts;
+
+    /**
+     * The number of string components added so far.
+     */
+    private int size;
+
+    /**
+     * Total length in chars so far, excluding prefix and suffix.
+     */
+    private int len;
+
+    /**
+     * Constructs a {@code StringChain} with no characters in it, with no {@code prefix} or {@code suffix}, and a
+     * copy of the supplied {@code delimiter}. If no characters are added to the {@code StringChain} and methods
+     * accessing the value of it are invoked, it will not return a {@code prefix} or {@code suffix} (or properties
+     * thereof) in the result, unless {@code setEmptyValue} has first been called.
+     */
+    public StringChain() {
+        this(8);
+    }
+
+    /**
+     * Constructs a {@code StringChain} with no characters in it using copies of the supplied {@code prefix},
+     * {@code delimiter} and {@code suffix}. If no characters are added to the {@code StringChain}
+     * and methods accessing the string value of it are invoked, it will return the {@code prefix + suffix}
+     * (or properties thereof) in the result, unless {@code setEmptyValue} has first been called.
+     */
+    public StringChain(int chunks) {
+        if (chunks < 0) {
+            throw new IllegalArgumentException("Illegal capacity: " + chunks);
+        }
+        this.elts = new String[chunks];
+    }
+
+    private static int getChars(String s, char[] chars, int start) {
+        int len = s.length();
+        s.getChars(0, len, chars, start);
+        return len;
+    }
+
+    /**
+     * Returns the current value, consisting of the {@code prefix}, the values added so far separated by the {@code
+     * delimiter}, and the {@code suffix}, unless no elements have been added in which case, the {@code prefix + suffix}
+     * or the {@code emptyValue} characters are returned.
+     *
+     * @return the string representation of this {@code StringChain}
+     */
+    @Override
+    public String toString() {
+        final int size = this.size;
+        if (size == 0) {
+            return "";
+        }
+        final String[] elts = this.elts;
+        final char[] chars = new char[len];
+        int startFrom = getChars(elts[0], chars, 0);
+        for (int i = 1; i < size; i++) {
+            startFrom += getChars(elts[i], chars, startFrom);
+        }
+        return new String(chars);
+    }
+
+    /**
+     * Adds a copy of the given {@code CharSequence} value as the next element of the {@code StringChain} value.
+     * If {@code cs} is {@code null}, then {@code "null"} is added.
+     *
+     * @param cs The element to add
+     * @return a reference to this {@code StringChain}
+     */
+    public StringChain add(CharSequence cs) {
+        final String elt = String.valueOf(cs);
+
+        if (size == elts.length) {
+            if (size == 0) {
+                elts = new String[8];
+            } else {
+                elts = Arrays.copyOf(elts, 2 * size);
+            }
+        }
+        len += elt.length();
+        elts[size++] = elt;
+        return this;
+    }
+
+    /**
+     * Adds String representation of the given {@code Object} as the next element of the {@code StringChain} value.
+     * If {@code o} is {@code null}, then {@code "null"} is added.
+     *
+     * @param  o The element to add
+     * @return a reference to this {@code StringChain}
+     */
+    public StringChain add(Object o) {
+        return this.add(String.valueOf(o));
+    }
+
+    /**
+     * Adds the contents of the given {@code StringChain} without prefix and suffix as the next element if it is
+     * non-empty. If the given {@code StringChain} is empty, the call has no effect.
+     *
+     * <p>A {@code StringChain} is empty if {@link #add(CharSequence) add()}
+     * has never been called, and if {@code merge()} has never been called with a non-empty {@code StringChain}
+     * argument.
+     *
+     * <p>If the other {@code StringChain} is using a different delimiter,
+     * then elements from the other {@code StringChain} are concatenated with that delimiter and the result is
+     * appended to this {@code StringChain} as a single element.
+     *
+     * @param other The {@code StringChain} whose contents should be merged into this one
+     * @return This {@code StringChain}
+     * @throws NullPointerException if the other {@code StringChain} is null
+     */
+    public StringChain merge(StringChain other) {
+        Objects.requireNonNull(other);
+        if (other.size == 0) {
+            return this;
+        }
+        other.compactElts();
+        return add(other.elts[0]);
+    }
+
+    private void compactElts() {
+        if (size > 1) {
+            final char[] chars = new char[len];
+            int i = 1;
+            int startFrom = getChars(elts[0], chars, 0);
+            do {
+                startFrom += getChars(elts[i], chars, startFrom);
+                elts[i] = null;
+            } while (++i < size);
+            size = 1;
+            elts[0] = new String(chars);
+        }
+    }
+
+    /**
+     * Returns the length of the {@code String} representation of this {@code StringChain}. Note that if no add
+     * methods have been called, then the length of the {@code String} representation (either {@code prefix + suffix} or
+     * {@code emptyValue}) will be returned. The value should be equivalent to {@code toString().length()}.
+     *
+     * @return the length of the current value of {@code StringChain}
+     */
+    public int length() {
+        return size == 0 ? 0 : len;
+    }
+
+}
diff --git a/src/java.base/share/classes/java/util/stream/Collectors.java b/src/java.base/share/classes/java/util/stream/Collectors.java
--- a/src/java.base/share/classes/java/util/stream/Collectors.java
+++ b/src/java.base/share/classes/java/util/stream/Collectors.java
@@ -24,6 +24,7 @@
  */
 package java.util.stream;
 
+import java.base.share.classes.java.lang.StringChain;
 import java.util.AbstractMap;
 import java.util.AbstractSet;
 import java.util.ArrayList;
@@ -365,10 +366,10 @@
      * {@code String}, in encounter order
      */
     public static Collector<CharSequence, ?, String> joining() {
-        return new CollectorImpl<CharSequence, StringBuilder, String>(
-                StringBuilder::new, StringBuilder::append,
-                (r1, r2) -> { r1.append(r2); return r1; },
-                StringBuilder::toString, CH_NOID);
+        return new CollectorImpl<CharSequence, StringChain, String>(
+                StringChain::new, StringChain::add,
+                (r1, r2) -> { r1.add(r2); return r1; },
+                StringChain::toString, CH_NOID);
     }
 
     /**

Reply via email to