Hello Tagir, FYI, there's a full table of results for original/patched benchmarks:
Benchmark (count) (latin) (length) Original Patched Units stringJoiner 1 true 1 26.9 ± 0.7 48.8 ± 2.2 ns/op stringJoiner 1 true 5 30.5 ± 1.0 46.1 ± 2.1 ns/op stringJoiner 1 true 10 31.2 ± 0.6 47.3 ± 1.3 ns/op stringJoiner 1 true 100 62.5 ± 3.3 79.9 ± 4.8 ns/op stringJoiner 5 true 1 78.2 ± 1.6 110.3 ± 2.9 ns/op stringJoiner 5 true 5 94.2 ± 8.7 116.6 ± 0.7 ns/op stringJoiner 5 true 10 95.3 ± 6.9 100.1 ± 0.4 ns/op stringJoiner 5 true 100 188.0 ± 10.2 136.0 ± 0.4 ns/op stringJoiner 10 true 1 160.3 ± 4.5 172.9 ± 0.8 ns/op stringJoiner 10 true 5 169.0 ± 4.7 180.2 ± 9.1 ns/op stringJoiner 10 true 10 205.7 ± 16.4 182.7 ± 1.1 ns/op stringJoiner 10 true 100 366.5 ± 17.0 284.5 ± 3.1 ns/op stringJoiner 100 true 1 1117.6 ± 11.1 2123.7 ± 11.1 ns/op stringJoiner 100 true 5 1270.7 ± 40.2 2163.6 ± 12.4 ns/op stringJoiner 100 true 10 1364.4 ± 14.0 2283.8 ± 16.1 ns/op stringJoiner 100 true 100 3592.9 ± 164.8 3535.2 ± 29.9 ns/op stringJoiner 1 false 1 35.6 ± 1.2 59.1 ± 3.0 ns/op stringJoiner 1 false 5 39.3 ± 1.2 52.6 ± 2.5 ns/op stringJoiner 1 false 10 42.2 ± 1.6 53.6 ± 0.3 ns/op stringJoiner 1 false 100 70.5 ± 1.8 86.4 ± 0.4 ns/op stringJoiner 5 false 1 89.0 ± 3.5 102.2 ± 1.0 ns/op stringJoiner 5 false 5 87.6 ± 0.7 106.5 ± 1.2 ns/op stringJoiner 5 false 10 109.0 ± 5.6 116.5 ± 1.2 ns/op stringJoiner 5 false 100 324.0 ± 16.5 221.9 ± 0.5 ns/op stringJoiner 10 false 1 183.9 ± 5.9 204.7 ± 5.5 ns/op stringJoiner 10 false 5 198.7 ± 9.7 202.4 ± 1.5 ns/op stringJoiner 10 false 10 196.7 ± 6.9 226.7 ± 6.4 ns/op stringJoiner 10 false 100 535.8 ± 2.3 553.0 ± 5.6 ns/op stringJoiner 100 false 1 1674.6 ± 122.1 1940.8 ± 16.2 ns/op stringJoiner 100 false 5 1791.9 ± 58.1 2158.1 ± 12.0 ns/op stringJoiner 100 false 10 2124.1 ± 193.3 2364.0 ± 25.2 ns/op stringJoiner 100 false 100 4323.4 ± 29.2 4675.5 ± 11.8 ns/op stringJoiner:·gc.alloc.rate.norm 1 true 1 120.0 ± 0.0 120.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 1 true 5 128.0 ± 0.0 120.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 1 true 10 144.0 ± 0.0 136.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 1 true 100 416.0 ± 0.0 312.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 5 true 1 144.0 ± 0.0 136.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 5 true 5 200.0 ± 0.0 168.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 5 true 10 272.0 ± 0.0 216.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 5 true 100 1632.0 ± 0.0 1128.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 10 true 1 256.0 ± 0.0 232.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 10 true 5 376.0 ± 0.0 316.8 ± 4.9 B/op stringJoiner:·gc.alloc.rate.norm 10 true 10 520.0 ± 0.0 408.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 10 true 100 3224.1 ± 0.0 2236.9 ± 21.2 B/op stringJoiner:·gc.alloc.rate.norm 100 true 1 1748.1 ± 4.0 1592.2 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 100 true 5 2948.2 ± 4.0 2392.3 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 100 true 10 4444.3 ± 4.0 3384.3 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 100 true 100 31441.4 ± 0.0 21385.4 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 1 false 1 144.0 ± 0.0 144.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 1 false 5 160.0 ± 0.0 160.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 1 false 10 184.0 ± 0.0 184.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 1 false 100 640.0 ± 0.0 640.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 5 false 1 184.0 ± 0.0 184.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 5 false 5 280.0 ± 0.0 280.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 5 false 10 400.0 ± 0.0 400.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 5 false 100 2664.1 ± 0.0 2664.1 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 10 false 1 320.0 ± 0.0 334.4 ± 7.4 B/op stringJoiner:·gc.alloc.rate.norm 10 false 5 520.0 ± 0.0 520.0 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 10 false 10 760.0 ± 0.0 769.6 ± 6.5 B/op stringJoiner:·gc.alloc.rate.norm 10 false 100 5264.2 ± 0.0 5273.8 ± 6.5 B/op stringJoiner:·gc.alloc.rate.norm 100 false 1 2204.2 ± 4.0 2216.3 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 100 false 5 4196.3 ± 6.2 4216.4 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 100 false 10 6696.5 ± 5.4 6712.6 ± 0.0 B/op stringJoiner:·gc.alloc.rate.norm 100 false 100 51702.0 ± 4.1 51714.4 ± 0.0 B/op > How would you explain that this code change doesn't improve the > performance for given count and length? Indeed, there are cases when we have degradation of performance, actually almost all cases when all the Strings are non-latin (param latin = false in benchmark) perform worse than with original code. The reason for that is current suboptimal implementation in my patch: I do latin1 check for each added String, applying reflection which is redundant as we need only one non-latin String to conclude about fallback to char[]. I will change this and remeasure. > Instead, I would experiment with pre-sized StringBuilder inside compactElts, instead of char[] array I think this is worth-to-check idea. Btw, it's interesting that in some cases (a few comparatively short Strings) most of the work is done in StringJoiner.add(), not in StringJoiner.toString(). ns percent samples top ---------- ------- ------- --- 16320527091 53.58% 1632 java.util.StringJoiner.add 10060352690 33.03% 1006 java.util.StringJoiner.toString 1120064343 3.68% 112 tsypanov.strings.source.string.Joiner.joinWithStringJoiner 970044951 3.18% 97 tsypanov.strings.join.generated.StringJoinerBenchmark_stringJoiner_jmhTest.stringJoiner_avgt_jmhStub 899983151 2.95% 90 org.openjdk.jmh.infra.Blackhole.consume 190021776 0.62% 19 tsypanov.strings.join.StringJoinerBenchmark.stringJoiner Regards, Sergey Tsypanov 10.02.2020, 14:08, "Tagir Valeev" <amae...@gmail.com>: > Hello! > > In many tests, I see little or no performance improvements. E.g.: > stringJoiner 100 10 1768.8 ± > 160.6 1760.8 ± 111.4 ns/op > > How would you explain that this code change doesn't improve the > performance for given count and length? > > Also, you are using `new String(bytes)` assuming that the platform > charset is latin1-compatible. This is not always true, so your code > would produce incorrect results depending on this setting. In general, > decoding via charset is tons of extra work. Instead, I would > experiment with pre-sized StringBuilder inside compactElts, instead of > char[] array. While it does some extra checks, StringBuilder already > can use the compact string representation, so if it's properly > pre-sized, no extra memory will be allocated. > > Finally, if you optimize for the case when X is true you should always > benchmark what happens when X is false. It's great that in some cases > we see a speedup for latin1 strings. But what if not all of the > strings are latin1? Is there any performance degradation? If yes, can > we tolerate it? > > With best regards, > Tagir Valeev. > > On Mon, Feb 10, 2020 at 3:22 PM Сергей Цыпанов > <sergei.tsypa...@yandex.ru> wrote: >> Hello, >> >> I've reworked the code, patch is attached. Could you please review my >> solution regarding usage of SahredSecrets? >> >> P.S. After I've created the patch it came to my mind that instead of >> checking all Strings when calling StringJoiner.add() >> we can check them in toString() method and fail-fast in case at least one >> of them is non-latin. This likely to reduce >> regression related to usage of reflection. >> >> Regards, >> Sergey Tsypanov >> >> 05.02.2020, 23:21, "fo...@univ-mlv.fr" <fo...@univ-mlv.fr>: >> > ----- Mail original ----- >> >> De: "Сергей Цыпанов" <sergei.tsypa...@yandex.ru> >> >> À: "Remi Forax" <fo...@univ-mlv.fr>, "core-libs-dev" >> <core-libs-dev@openjdk.java.net> >> >> Envoyé: Mercredi 5 Février 2020 22:12:34 >> >> Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner >> > >> >> Hello, >> >> >> >>> If you want to optimize StringJoiner, the best way to do it is to use >> the shared >> >>> secret mechanism so a java.util class can see implementation details of >> a >> >>> java.lang class without exposing those details publicly. >> >>> As an example, take a look to EnumSet and its implementations. >> >> >> >> I've looked into SharedSecrets, it seems there's no ready-to-use method >> for >> >> accessing package-private method. Do you mean it's necessary to add >> particular >> >> functionality to JavaLangReflectionAccess as they did for JavaLangAccess >> in >> >> order to deal with EnumSet? >> > >> > yes ! >> > crossing package boundary in a non public way is not free, >> > but given that StringJoiner is used quite often (directly or indirectly >> using Collectors.joining()), it may worth the cost. >> > >> >> Regards, >> >> Sergey >> > >> > Regards, >> > Rémi >> > >> >> 04.02.2020, 12:12, "Remi Forax" <fo...@univ-mlv.fr>: >> >>> ----- Mail original ----- >> >>>> De: "Сергей Цыпанов" <sergei.tsypa...@yandex.ru> >> >>>> À: "jonathan gibbons" <jonathan.gibb...@oracle.com>, "core-libs-dev" >> >>>> <core-libs-dev@openjdk.java.net> >> >>>> Envoyé: Mardi 4 Février 2020 08:53:31 >> >>>> Objet: Re: [PATCH] Enhancement proposal for java.util.StringJoiner >> >>> >> >>>> Hello, >> >>> >> >>> Hi Sergey, >> >>> >> >>>> I'd probably agree about a new class in java.lang, but what is wrong >> about >> >>>> exposing package-private method >> >>>> which doesn't modify the state of the object and has no side effects? >> >>> >> >>> You can not change the implementation anymore, >> >>> by example if instead of having a split between latin1 and non latin1, >> we decide >> >>> in the future to split between utf8 and non utf8. >> >>> >> >>> If you want to optimize StringJoiner, the best way to do it is to use >> the shared >> >>> secret mechanism so a java.util class can see implementation details of >> a >> >>> java.lang class without exposing those details publicly. >> >>> As an example, take a look to EnumSet and its implementations. >> >>> >> >>> regards, >> >>> Rémi >> >>> >> >>>> 04.02.2020, 00:58, "Jonathan Gibbons" <jonathan.gibb...@oracle.com>: >> >>>>> Sergey, >> >>>>> >> >>>>> It is equally bad to create a new class in the java.lang package as it >> >>>>> is to add a new public method to java.lang.String. >> >>>>> >> >>>>> -- Jon >> >>>>> >> >>>>> On 2/3/20 2:38 PM, Сергей Цыпанов wrote: >> >>>>>> Hello, >> >>>>>> >> >>>>>> as of JDK14 java.util.StringJoiner still uses char[] as a storage of >> glued >> >>>>>> Strings. >> >>>>>> >> >>>>>> This applies for the cases when all joined Strings as well as >> delimiter, prefix >> >>>>>> and suffix contain only ASCII symbols. >> >>>>>> >> >>>>>> As a result when StringJoiner.toString() is invoked, byte[] stored >> in String is >> >>>>>> inflated in order to fill in char[] and >> >>>>>> finally char[] is compressed when constructor of String is called: >> >>>>>> >> >>>>>> String delimiter = this.delimiter; >> >>>>>> char[] chars = new char[this.len + addLen]; >> >>>>>> int k = getChars(this.prefix, chars, 0); >> >>>>>> if (size > 0) { >> >>>>>> k += getChars(elts[0], chars, k); // inflate byte[] -> char[] >> >>>>>> >> >>>>>> for(int i = 1; i < size; ++i) { >> >>>>>> k += getChars(delimiter, chars, k); >> >>>>>> k += getChars(elts[i], chars, k); >> >>>>>> } >> >>>>>> } >> >>>>>> >> >>>>>> k += getChars(this.suffix, chars, k); >> >>>>>> return new String(chars); // compress char[] -> byte[] >> >>>>>> >> >>>>>> This can be improved by detecting cases when String.isLatin1() >> returns true for >> >>>>>> all involved Strings. >> >>>>>> >> >>>>>> I've prepared a patch along with benchmark proving that this change >> is correct >> >>>>>> and brings improvement. >> >>>>>> The only concern I have is about String.isLatin1(): as far as String >> belongs to >> >>>>>> java.lang and StringJoiner to java.util >> >>>>>> package-private String.isLatin1() cannot be directly accessed, we >> need to make >> >>>>>> it public for successful compilation. >> >>>>>> >> >>>>>> Another solution is to create an intermediate utility class located >> in java.lang >> >>>>>> which delegates the call to String.isLatin1(): >> >>>>>> >> >>>>>> package java.lang; >> >>>>>> >> >>>>>> public class StringHelper { >> >>>>>> public static boolean isLatin1(String str) { >> >>>>>> return str.isLatin1(); >> >>>>>> } >> >>>>>> } >> >>>>>> >> >>>>>> This allows to keep java.lang.String intact and have access to it's >> >>>>>> package-private method outside of java.lang package. >> >>>>>> >> >>>>>> Below I've added results of benchmarking for specified case (all >> Strings are >> >>>>>> Latin1). The other case (at least one String is UTF-8) uses existing >> code so >> >>>>>> there will be only a tiny regression due to several if-checks. >> >>>>>> >> >>>>>> With best regards, >> >>>>>> Sergey Tsypanov >> >>>>>> >> >>>>>> (count) (length) Original Patched Units >> >>>>>> stringJoiner 1 1 26.7 ± 1.3 38.2 ± 1.1 ns/op >> >>>>>> stringJoiner 1 5 27.4 ± 0.0 40.5 ± 2.2 ns/op >> >>>>>> stringJoiner 1 10 29.6 ± 1.9 38.4 ± 1.9 ns/op >> >>>>>> stringJoiner 1 100 61.1 ± 6.9 47.6 ± 0.6 ns/op >> >>>>>> stringJoiner 5 1 91.1 ± 6.7 83.6 ± 2.0 ns/op >> >>>>>> stringJoiner 5 5 96.1 ± 10.7 85.6 ± 1.1 ns/op >> >>>>>> stringJoiner 5 10 105.5 ± 14.3 84.7 ± 1.1 ns/op >> >>>>>> stringJoiner 5 100 266.6 ± 30.1 139.6 ± 14.0 ns/op >> >>>>>> stringJoiner 10 1 190.7 ± 23.0 162.0 ± 2.9 ns/op >> >>>>>> stringJoiner 10 5 200.0 ± 16.9 167.5 ± 11.0 ns/op >> >>>>>> stringJoiner 10 10 216.4 ± 12.4 164.8 ± 1.7 ns/op >> >>>>>> stringJoiner 10 100 545.3 ± 49.7 282.2 ± 12.0 ns/op >> >>>>>> stringJoiner 100 1 1467.0 ± 90.3 1302.0 ± 18.5 ns/op >> >>>>>> stringJoiner 100 5 1491.8 ± 166.2 1493.0 ± 135.4 ns/op >> >>>>>> stringJoiner 100 10 1768.8 ± 160.6 1760.8 ± 111.4 ns/op >> >>>>>> stringJoiner 100 100 3654.3 ± 113.1 3120.9 ± 175.9 ns/op >> >>>>>> >> >>>>>> stringJoiner:·gc.alloc.rate.norm 1 1 120.0 ± 0.0 120.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 1 5 128.0 ± 0.0 120.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 1 10 144.0 ± 0.0 136.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 1 100 416.0 ± 0.0 312.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 5 1 144.0 ± 0.0 136.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 5 5 200.0 ± 0.0 168.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 5 10 272.0 ± 0.0 216.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 5 100 1632.0 ± 0.0 1128.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 10 1 256.0 ± 0.0 232.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 10 5 376.0 ± 0.0 312.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 10 10 520.0 ± 0.0 408.0 ± 0.0 B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 10 100 3224.1 ± 0.0 2216.1 ± 0.0 >> B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 100 1 1760.2 ± 14.9 1544.2 ± 0.0 >> B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 100 5 2960.3 ± 14.9 2344.2 ± 0.0 >> B/op >> >>>>>> stringJoiner:·gc.alloc.rate.norm 100 10 4440.4 ± 0.0 3336.3 ± 0.0 >> B/op >> >> >> >> stringJoiner:·gc.alloc.rate.norm 100 100 31449.3 ± 12.2 21346.7 ± >> 14.7 B/op