Re: Optimize StringConcatHelper::simpleConcat for empty strings

Tagir Valeev Sun, 14 Jun 2020 20:59:02 -0700

Hello, Peter and Claes!

> Cache locality might suffer a bit down the line since
> String and byte[] becomes less likely to reside in the same slice
> of heap memory.


There's a JEP 192 String Deduplication in G1 that does similar
optimization inside the GC (currently it's off by default).
The 'Risk and assumptions' section reads as:

> Normally a String object and its corresponding character array will be placed 
> next to each other in memory, leading to good cache locality. After 
> deduplicating a String object its character array will be further away, which 
> might introduce a slight performance overhead. Initial measurements have, 
> however, shown that this does not seem to be a problem. In fact, after 
> deduplication the total amount of accessed memory is reduced which tends to 
> result in improved cache hit rates.

So I believe it would not be a big problem in our case either.

> You could simplify as "return new String(s{2,1});".

Thanks! Totally forgot about that String constructor.

Peter,
> Also, for cases where one argument evaluates to empty string and the
> other is not a String, I think you could simply return the seconds
> string (not creating another instance with reused value/coder)\

I believe, this may still violate the spec. The resulting object must
be a 'newly created' while the custom toString() may return cached or
constant string. So we still need to copy it. This also applies to the
Claes' unaryConcat:

+    @ForceInline
+    static String unaryConcat(Object first) {
+        if (first instanceof String) {
+            // Must produce a new String
+            return new String((String)first);
+        } else {
+            return stringOf(first);
+        }
+    }

It should be just
// Must produce a new String
return new String(stringOf(first));

So should I file a separate JBS issue for my optimization or both
cases (constant empty string and non-constant empty string) could be
handled under a single issue?

With best regards,
Tagir Valeev.

On Mon, Jun 15, 2020 at 4:04 AM Peter Levart <peter.lev...@gmail.com> wrote:
>
> Hi Tagir,
>
>
> I think that there might be cases where one of the arguments of
> concatenation is constant empty String. I found myself sometimes doing
> "" + someNonStringValue (instead of more readable
> String.valueOf(someNonStringValue)) simply because of laziness. So does
> this optimization help in that case as much as with non-constant "" +
> "longlonglongline" ?
>
>
> Also, for cases where one argument evaluates to empty string and the
> other is not a String, I think you could simply return the seconds
> string (not creating another instance with reused value/coder):
>
>
> if (s1.isEmpty()) {
>
>    return s2 == second ? new String(s2.value(), s2.coder()) : s2;
>
> }
>
>
> Regards, Peter
>
>
> On 6/13/20 7:08 AM, Tagir Valeev wrote:
> > Hello!
> >
> > It's quite possible that when we concatenate two strings, one of them
> > appears to be empty. We cannot simply return another string in this
> > case, as JLS 15.18.1 explicitly says (for unknown to me reason) about
> > the result of the string concatenation expression that 'The String
> > object is newly created'. However, it's still possible to reuse the
> > internal array of another string to reduce allocations:
> >
> > --- src/java.base/share/classes/java/lang/StringConcatHelper.java
> > (revision 58998:04e3d254c76be87788a40cbd66d013140ea951d8)
> > +++ src/java.base/share/classes/java/lang/StringConcatHelper.java
> > (revision 58998+:04e3d254c76b+)
> > @@ -420,6 +420,12 @@
> >       static String simpleConcat(Object first, Object second) {
> >           String s1 = stringOf(first);
> >           String s2 = stringOf(second);
> > +        if (s1.isEmpty()) {
> > +            return new String(s2.value(), s2.coder());
> > +        }
> > +        if (s2.isEmpty()) {
> > +            return new String(s1.value(), s1.coder());
> > +        }
> >           // start "mixing" in length and coder or arguments, order is not
> >           // important
> >           long indexCoder = mix(initialCoder(), s2);
> >
> > Very simple benchmark like this validates that the concatenation
> > became faster if one of the strings is empty:
> >
> >    @Param({"", "longlonglongline"})
> >    String data;
> >
> >    @Param({"", "longlonglongline"})
> >    String data2;
> >
> >    @Benchmark
> >    public String plus() {
> >      return data + data2;
> >    }
> >
> > Without patch I observe on VM 15-ea+20-899:
> >
> > Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  Units
> > plus                                      avgt   30  15,335 ± 0,186  ns/op
> > plus                    longlonglongline  avgt   30  19,867 ± 0,109  ns/op
> > plus  longlonglongline                    avgt   30  20,283 ± 0,230  ns/op
> > plus  longlonglongline  longlonglongline  avgt   30  26,047 ± 0,230  ns/op
> >
> > With patch:
> > Benchmark       (data)           (data2)  Mode  Cnt   Score   Error  Units
> > plus                                      avgt   30   6,668 ± 0,055  ns/op
> > plus                    longlonglongline  avgt   30   6,708 ± 0,114  ns/op
> > plus  longlonglongline                    avgt   30   7,003 ± 0,064  ns/op
> > plus  longlonglongline  longlonglongline  avgt   30  25,126 ± 0,392  ns/op
> >
> > There could be an added cost of up to two branches for the normal case
> > (I believe, if one of the strings is constant, then decent JIT can
> > eliminate one of the branches). However, I believe, the benefit could
> > outweigh it, as empty strings are not that uncommon and in this case,
> > we reduce O(N) time and memory consumption to O(1).
> >
> > What do you think? Is this a reasonable thing to do? I can file an
> > issue and submit a proper webrev if it looks like a useful patch.
> >
> > With best regards,
> > Tagir Valeev.

Re: Optimize StringConcatHelper::simpleConcat for empty strings

Reply via email to