Hello,

I've reworked the code, patch is attached. Could you please review my solution 
regarding usage of SahredSecrets?

P.S. After I've created the patch it came to my mind that instead of checking 
all Strings when calling StringJoiner.add()
we can check them in toString() method and fail-fast in case at least one of 
them is non-latin. This likely to reduce 
regression related to usage of reflection.

Regards,
Sergey Tsypanov

05.02.2020, 23:21, "fo...@univ-mlv.fr" <fo...@univ-mlv.fr>:
> ----- Mail original -----
>>  De: "Сергей Цыпанов" <sergei.tsypa...@yandex.ru>
>>  À: "Remi Forax" <fo...@univ-mlv.fr>, "core-libs-dev" 
>> <core-libs-dev@openjdk.java.net>
>>  Envoyé: Mercredi 5 Février 2020 22:12:34
>>  Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
>
>>  Hello,
>>
>>>  If you want to optimize StringJoiner, the best way to do it is to use the 
>>> shared
>>>  secret mechanism so a java.util class can see implementation details of a
>>>  java.lang class without exposing those details publicly.
>>>  As an example, take a look to EnumSet and its implementations.
>>
>>  I've looked into SharedSecrets, it seems there's no ready-to-use method for
>>  accessing package-private method. Do you mean it's necessary to add 
>> particular
>>  functionality to JavaLangReflectionAccess as they did for JavaLangAccess in
>>  order to deal with EnumSet?
>
> yes !
> crossing package boundary in a non public way is not free,
> but given that StringJoiner is used quite often (directly or indirectly using 
> Collectors.joining()), it may worth the cost.
>
>>  Regards,
>>  Sergey
>
> Regards,
> Rémi
>
>>  04.02.2020, 12:12, "Remi Forax" <fo...@univ-mlv.fr>:
>>>  ----- Mail original -----
>>>>   De: "Сергей Цыпанов" <sergei.tsypa...@yandex.ru>
>>>>   À: "jonathan gibbons" <jonathan.gibb...@oracle.com>, "core-libs-dev"
>>>>   <core-libs-dev@openjdk.java.net>
>>>>   Envoyé: Mardi 4 Février 2020 08:53:31
>>>>   Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner
>>>
>>>>   Hello,
>>>
>>>  Hi Sergey,
>>>
>>>>   I'd probably agree about a new class in java.lang, but what is wrong 
>>>> about
>>>>   exposing package-private method
>>>>   which doesn't modify the state of the object and has no side effects?
>>>
>>>  You can not change the implementation anymore,
>>>  by example if instead of having a split between latin1 and non latin1, we 
>>> decide
>>>  in the future to split between utf8 and non utf8.
>>>
>>>  If you want to optimize StringJoiner, the best way to do it is to use the 
>>> shared
>>>  secret mechanism so a java.util class can see implementation details of a
>>>  java.lang class without exposing those details publicly.
>>>  As an example, take a look to EnumSet and its implementations.
>>>
>>>  regards,
>>>  Rémi
>>>
>>>>   04.02.2020, 00:58, "Jonathan Gibbons" <jonathan.gibb...@oracle.com>:
>>>>>   Sergey,
>>>>>
>>>>>   It is equally bad to create a new class in the java.lang package as it
>>>>>   is to add a new public method to java.lang.String.
>>>>>
>>>>>   -- Jon
>>>>>
>>>>>   On 2/3/20 2:38 PM, Сергей Цыпанов wrote:
>>>>>>    Hello,
>>>>>>
>>>>>>    as of JDK14 java.util.StringJoiner still uses char[] as a storage of 
>>>>>> glued
>>>>>>    Strings.
>>>>>>
>>>>>>    This applies for the cases when all joined Strings as well as 
>>>>>> delimiter, prefix
>>>>>>    and suffix contain only ASCII symbols.
>>>>>>
>>>>>>    As a result when StringJoiner.toString() is invoked, byte[] stored in 
>>>>>> String is
>>>>>>    inflated in order to fill in char[] and
>>>>>>    finally char[] is compressed when constructor of String is called:
>>>>>>
>>>>>>    String delimiter = this.delimiter;
>>>>>>    char[] chars = new char[this.len + addLen];
>>>>>>    int k = getChars(this.prefix, chars, 0);
>>>>>>    if (size > 0) {
>>>>>>         k += getChars(elts[0], chars, k); // inflate byte[] -> char[]
>>>>>>
>>>>>>         for(int i = 1; i < size; ++i) {
>>>>>>             k += getChars(delimiter, chars, k);
>>>>>>             k += getChars(elts[i], chars, k);
>>>>>>         }
>>>>>>    }
>>>>>>
>>>>>>    k += getChars(this.suffix, chars, k);
>>>>>>    return new String(chars); // compress char[] -> byte[]
>>>>>>
>>>>>>    This can be improved by detecting cases when String.isLatin1() 
>>>>>> returns true for
>>>>>>    all involved Strings.
>>>>>>
>>>>>>    I've prepared a patch along with benchmark proving that this change 
>>>>>> is correct
>>>>>>    and brings improvement.
>>>>>>    The only concern I have is about String.isLatin1(): as far as String 
>>>>>> belongs to
>>>>>>    java.lang and StringJoiner to java.util
>>>>>>    package-private String.isLatin1() cannot be directly accessed, we 
>>>>>> need to make
>>>>>>    it public for successful compilation.
>>>>>>
>>>>>>    Another solution is to create an intermediate utility class located 
>>>>>> in java.lang
>>>>>>    which delegates the call to String.isLatin1():
>>>>>>
>>>>>>    package java.lang;
>>>>>>
>>>>>>    public class StringHelper {
>>>>>>         public static boolean isLatin1(String str) {
>>>>>>             return str.isLatin1();
>>>>>>         }
>>>>>>    }
>>>>>>
>>>>>>    This allows to keep java.lang.String intact and have access to it's
>>>>>>    package-private method outside of java.lang package.
>>>>>>
>>>>>>    Below I've added results of benchmarking for specified case (all 
>>>>>> Strings are
>>>>>>    Latin1). The other case (at least one String is UTF-8) uses existing 
>>>>>> code so
>>>>>>    there will be only a tiny regression due to several if-checks.
>>>>>>
>>>>>>    With best regards,
>>>>>>    Sergey Tsypanov
>>>>>>
>>>>>>                                               (count) (length) Original 
>>>>>> Patched Units
>>>>>>    stringJoiner 1 1 26.7 ± 1.3 38.2 ± 1.1 ns/op
>>>>>>    stringJoiner 1 5 27.4 ± 0.0 40.5 ± 2.2 ns/op
>>>>>>    stringJoiner 1 10 29.6 ± 1.9 38.4 ± 1.9 ns/op
>>>>>>    stringJoiner 1 100 61.1 ± 6.9 47.6 ± 0.6 ns/op
>>>>>>    stringJoiner 5 1 91.1 ± 6.7 83.6 ± 2.0 ns/op
>>>>>>    stringJoiner 5 5 96.1 ± 10.7 85.6 ± 1.1 ns/op
>>>>>>    stringJoiner 5 10 105.5 ± 14.3 84.7 ± 1.1 ns/op
>>>>>>    stringJoiner 5 100 266.6 ± 30.1 139.6 ± 14.0 ns/op
>>>>>>    stringJoiner 10 1 190.7 ± 23.0 162.0 ± 2.9 ns/op
>>>>>>    stringJoiner 10 5 200.0 ± 16.9 167.5 ± 11.0 ns/op
>>>>>>    stringJoiner 10 10 216.4 ± 12.4 164.8 ± 1.7 ns/op
>>>>>>    stringJoiner 10 100 545.3 ± 49.7 282.2 ± 12.0 ns/op
>>>>>>    stringJoiner 100 1 1467.0 ± 90.3 1302.0 ± 18.5 ns/op
>>>>>>    stringJoiner 100 5 1491.8 ± 166.2 1493.0 ± 135.4 ns/op
>>>>>>    stringJoiner 100 10 1768.8 ± 160.6 1760.8 ± 111.4 ns/op
>>>>>>    stringJoiner 100 100 3654.3 ± 113.1 3120.9 ± 175.9 ns/op
>>>>>>
>>>>>>    stringJoiner:·gc.alloc.rate.norm 1 1 120.0 ± 0.0 120.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 1 5 128.0 ± 0.0 120.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 1 10 144.0 ± 0.0 136.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 1 100 416.0 ± 0.0 312.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 5 1 144.0 ± 0.0 136.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 5 5 200.0 ± 0.0 168.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 5 10 272.0 ± 0.0 216.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 5 100 1632.0 ± 0.0 1128.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 10 1 256.0 ± 0.0 232.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 10 5 376.0 ± 0.0 312.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 10 10 520.0 ± 0.0 408.0 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 10 100 3224.1 ± 0.0 2216.1 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 100 1 1760.2 ± 14.9 1544.2 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 100 5 2960.3 ± 14.9 2344.2 ± 0.0 B/op
>>>>>>    stringJoiner:·gc.alloc.rate.norm 100 10 4440.4 ± 0.0 3336.3 ± 0.0 B/op
>>  >>  >>  stringJoiner:·gc.alloc.rate.norm 100 100 31449.3 ± 12.2 21346.7 ± 
>> 14.7 B/op
diff --git a/src/java.base/share/classes/java/util/StringJoiner.java b/src/java.base/share/classes/java/util/StringJoiner.java
--- a/src/java.base/share/classes/java/util/StringJoiner.java
+++ b/src/java.base/share/classes/java/util/StringJoiner.java
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2013, 2018, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2013, 2020, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -24,6 +24,8 @@
  */
 package java.util;
 
+import jdk.internal.access.SharedSecrets;
+
 /**
  * {@code StringJoiner} is used to construct a sequence of characters separated
  * by a delimiter and optionally starting with a supplied prefix
@@ -60,8 +62,8 @@
  *
  * @see java.util.stream.Collectors#joining(CharSequence)
  * @see java.util.stream.Collectors#joining(CharSequence, CharSequence, CharSequence)
- * @since  1.8
-*/
+ * @since 1.8
+ */
 public final class StringJoiner {
     private final String prefix;
     private final String delimiter;
@@ -77,12 +79,14 @@
     private int len;
 
     /**
-     * When overridden by the user to be non-null via {@link setEmptyValue}, the
+     * When overridden by the user to be non-null via {@link #setEmptyValue}, the
      * string returned by toString() when no elements have yet been added.
      * When null, prefix + suffix is used as the empty value.
      */
     private String emptyValue;
 
+    private boolean allLatin1;
+
     /**
      * Constructs a {@code StringJoiner} with no characters in it, with no
      * {@code prefix} or {@code suffix}, and a copy of the supplied
@@ -125,6 +129,7 @@
         this.prefix = prefix.toString();
         this.delimiter = delimiter.toString();
         this.suffix = suffix.toString();
+        this.allLatin1 = isLatin1(this.prefix) && isLatin1(this.delimiter) && isLatin1(this.suffix);
     }
 
     /**
@@ -144,6 +149,7 @@
     public StringJoiner setEmptyValue(CharSequence emptyValue) {
         this.emptyValue = Objects.requireNonNull(emptyValue,
             "The empty value must not be null").toString();
+        this.allLatin1 &= isLatin1(this.emptyValue);
         return this;
     }
 
@@ -153,6 +159,13 @@
         return len;
     }
 
+    @SuppressWarnings("deprecation")
+    private static int getBytes(String s, byte[] bytes, int start) {
+        int len = s.length();
+        s.getBytes(0, len, bytes, start);
+        return len;
+    }
+
     /**
      * Returns the current value, consisting of the {@code prefix}, the values
      * added so far separated by the {@code delimiter}, and the {@code suffix},
@@ -173,6 +186,13 @@
             compactElts();
             return size == 0 ? "" : elts[0];
         }
+        if (allLatin1) {
+            return bytesToString(elts, size, addLen);
+        }
+        return charsToString(elts, size, addLen);
+    }
+
+    private String charsToString(String[] elts, int size, int addLen) {
         final String delimiter = this.delimiter;
         final char[] chars = new char[len + addLen];
         int k = getChars(prefix, chars, 0);
@@ -183,10 +203,25 @@
                 k += getChars(elts[i], chars, k);
             }
         }
-        k += getChars(suffix, chars, k);
+        getChars(suffix, chars, k);
         return new String(chars);
     }
 
+    private String bytesToString(String[] elts, int size, int addLen) {
+        final String delimiter = this.delimiter;
+        final byte[] bytes = new byte[len + addLen];
+        int k = getBytes(prefix, bytes, 0);
+        if (size > 0) {
+            k += getBytes(elts[0], bytes, k);
+            for (int i = 1; i < size; i++) {
+                k += getBytes(delimiter, bytes, k);
+                k += getBytes(elts[i], bytes, k);
+            }
+        }
+        getBytes(suffix, bytes, k);
+        return new String(bytes);
+    }
+
     /**
      * Adds a copy of the given {@code CharSequence} value as the next
      * element of the {@code StringJoiner} value. If {@code newElement} is
@@ -206,6 +241,7 @@
         }
         len += elt.length();
         elts[size++] = elt;
+        allLatin1 &= isLatin1(elt);
         return this;
     }
 
@@ -239,18 +275,39 @@
 
     private void compactElts() {
         if (size > 1) {
-            final char[] chars = new char[len];
-            int i = 1, k = getChars(elts[0], chars, 0);
-            do {
-                k += getChars(delimiter, chars, k);
-                k += getChars(elts[i], chars, k);
-                elts[i] = null;
-            } while (++i < size);
-            size = 1;
-            elts[0] = new String(chars);
+            if (allLatin1) {
+                compactBytes();
+            } else {
+                compactChars();
+            }
         }
     }
 
+    private void compactChars() {
+        final char[] chars = new char[len];
+        int i = 1, k = getChars(elts[0], chars, 0);
+        do {
+            k += getChars(delimiter, chars, k);
+            k += getChars(elts[i], chars, k);
+            elts[i] = null;
+        } while (++i < size);
+        size = 1;
+        elts[0] = new String(chars);
+    }
+
+    private void compactBytes() {
+        final byte[] bytes = new byte[len];
+        int i = 1;
+        int k = getBytes(elts[0], bytes, 0);
+        do {
+            k += getBytes(delimiter, bytes, k);
+            k += getBytes(elts[i], bytes, k);
+            elts[i] = null;
+        } while (++i < size);
+        size = 1;
+        elts[0] = new String(bytes);
+    }
+
     /**
      * Returns the length of the {@code String} representation
      * of this {@code StringJoiner}. Note that if
@@ -265,4 +322,8 @@
         return (size == 0 && emptyValue != null) ? emptyValue.length() :
             len + prefix.length() + suffix.length();
     }
+
+    private static boolean isLatin1(String str) {
+        return SharedSecrets.getJavaLangStringAccess().isLatin1(str);
+    }
 }
diff --git a/src/java.base/share/classes/jdk/internal/access/JavaLangStringAccess.java b/src/java.base/share/classes/jdk/internal/access/JavaLangStringAccess.java
new file mode 100644
--- /dev/null
+++ b/src/java.base/share/classes/jdk/internal/access/JavaLangStringAccess.java
@@ -0,0 +1,41 @@
+/*
+ * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.  Oracle designates this
+ * particular file as subject to the "Classpath" exception as provided
+ * by Oracle in the LICENSE file that accompanied this code.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+package jdk.internal.access;
+
+/**
+ * An interface which gives access to internals of {@link String}.
+ */
+public interface JavaLangStringAccess {
+
+    /**
+     * Returns true in case all characters of argument {@link String} are ASCII symbols
+     *
+     * @param str String to be checked
+     * @return whether all characters of argument {@link String} are ASCII symbols
+     */
+    boolean isLatin1(String str);
+
+}
diff --git a/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java b/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java
--- a/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java
+++ b/src/java.base/share/classes/jdk/internal/access/SharedSecrets.java
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2002, 2019, Oracle and/or its affiliates. All rights reserved.
+ * Copyright (c) 2002, 2020, Oracle and/or its affiliates. All rights reserved.
  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
  *
  * This code is free software; you can redistribute it and/or modify it
@@ -58,6 +58,7 @@
     private static JavaLangModuleAccess javaLangModuleAccess;
     private static JavaLangRefAccess javaLangRefAccess;
     private static JavaLangReflectAccess javaLangReflectAccess;
+    private static JavaLangStringAccess javaLangStringAccess;
     private static JavaIOAccess javaIOAccess;
     private static JavaIOFileDescriptorAccess javaIOFileDescriptorAccess;
     private static JavaIOFilePermissionAccess javaIOFilePermissionAccess;
@@ -139,6 +140,17 @@
         return javaLangReflectAccess;
     }
 
+    public static JavaLangStringAccess getJavaLangStringAccess() {
+        if (javaLangStringAccess == null) {
+            unsafe.ensureClassInitialized(StringAccess.class);
+        }
+        return javaLangStringAccess;
+    }
+
+    public static void setJavaLangStringAccess(JavaLangStringAccess stringAccess) {
+        javaLangStringAccess = stringAccess;
+    }
+
     public static void setJavaNetUriAccess(JavaNetUriAccess jnua) {
         javaNetUriAccess = jnua;
     }
diff --git a/src/java.base/share/classes/jdk/internal/access/StringAccess.java b/src/java.base/share/classes/jdk/internal/access/StringAccess.java
new file mode 100644
--- /dev/null
+++ b/src/java.base/share/classes/jdk/internal/access/StringAccess.java
@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
+ * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
+ *
+ * This code is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 only, as
+ * published by the Free Software Foundation.  Oracle designates this
+ * particular file as subject to the "Classpath" exception as provided
+ * by Oracle in the LICENSE file that accompanied this code.
+ *
+ * This code is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * version 2 for more details (a copy is included in the LICENSE file that
+ * accompanied this code).
+ *
+ * You should have received a copy of the GNU General Public License version
+ * 2 along with this work; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
+ * or visit www.oracle.com if you need additional information or have any
+ * questions.
+ */
+
+package jdk.internal.access;
+
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+
+/**
+ * Package-private class implementing the
+ * jdk.internal.access.JavaLangStringAccess interface,
+ * allowing non-members of java.lang package
+ * to access internals of {@link java.lang.String}.
+ * */
+class StringAccess implements jdk.internal.access.JavaLangStringAccess {
+
+    static {
+        SharedSecrets.setJavaLangStringAccess(new StringAccess());
+    }
+
+    private final Method isLatin1;
+
+    StringAccess() {
+        this.isLatin1 = initIsLatin1Method();
+    }
+
+    @Override
+    public boolean isLatin1(String str) {
+        try {
+            return (boolean) isLatin1.invoke(str);
+        } catch (IllegalAccessException | InvocationTargetException e) {
+            throw new RuntimeException(e);
+        }
+    }
+
+    private static Method initIsLatin1Method() {
+        try {
+            final Method isLatin1 = String.class.getDeclaredMethod("isLatin1");
+            isLatin1.setAccessible(true);
+            return isLatin1;
+        } catch (NoSuchMethodException e) {
+            throw new Error(e);
+        }
+    }
+}

Reply via email to