On Wed, 13 Jan 2021 13:48:40 GMT, Claes Redestad <[email protected]> wrote:
>> Instead of allocating a copy of underlying array via
>> `CharArrayWriter.toCharArray()` and passing it to constructor of String
>> String str = new String(charArrayWriter.toCharArray());
>> we could call `toString()` method
>> String str = charArrayWriter.toString();
>> decoding existing char[] without making a copy. This slightly speeds up the
>> method reducing at the same time memory consumption for decoding URLs with
>> non-latin symbols:
>> @State(Scope.Thread)
>> @BenchmarkMode(Mode.AverageTime)
>> @OutputTimeUnit(TimeUnit.NANOSECONDS)
>> @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g"})
>> public class UrlEncoderBenchmark {
>> private static final Charset charset = Charset.defaultCharset();
>> private static final String utf8Url =
>> "https://ru.wikipedia.org/wiki/Организация_Объединённых_Наций"; // UN
>>
>> @Benchmark
>> public String encodeUtf8() {
>> return URLEncoder.encode(utf8Url, charset);
>> }
>> }
>> The benchmark on my maching give the following output:
>> before
>> Benchmark Mode Cnt
>> Score Error Units
>> UrlEncoderBenchmark.encodeUtf8 avgt 100
>> 1166.378 ± 8.411 ns/op
>> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate avgt 100
>> 932.944 ± 6.393 MB/sec
>> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100
>> 1712.193 ± 0.005 B/op
>> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100
>> 929.221 ± 24.268 MB/sec
>> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100
>> 1705.444 ± 43.235 B/op
>> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100
>> 0.006 ± 0.001 MB/sec
>> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100
>> 0.011 ± 0.002 B/op
>> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100
>> 652.000 counts
>> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100
>> 334.000 ms
>>
>> after
>> Benchmark Mode Cnt
>> Score Error Units
>> UrlEncoderBenchmark.encodeUtf8 avgt 100
>> 1058.851 ± 6.006 ns/op
>> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate avgt 100
>> 931.489 ± 5.182 MB/sec
>> UrlEncoderBenchmark.encodeUtf8:·gc.alloc.rate.norm avgt 100
>> 1552.176 ± 0.005 B/op
>> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space avgt 100
>> 933.491 ± 24.164 MB/sec
>> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Eden_Space.norm avgt 100
>> 1555.488 ± 39.204 B/op
>> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space avgt 100
>> 0.006 ± 0.001 MB/sec
>> UrlEncoderBenchmark.encodeUtf8:·gc.churn.G1_Survivor_Space.norm avgt 100
>> 0.010 ± 0.002 B/op
>> UrlEncoderBenchmark.encodeUtf8:·gc.count avgt 100
>> 655.000 counts
>> UrlEncoderBenchmark.encodeUtf8:·gc.time avgt 100
>> 333.000 ms
>
> Looks good.
>
> I wonder... `CharArrayWriter` is an old and synchronized data structure, and
> since the instance used here isn't shared that synchronization seem useless.
> And since you're now bypassing the `char[]` and going straight for a `String`
> you might get better performance with a `StringBuilder` here? (`setLength(0)`
> instead of `reset()`...)
@cl4es SB brings pessimization both for time and memory, try
`org.openjdk.bench.java.net.URLEncodeDecode`:
master
(count) (maxLength) (mySeed)
Mode Cnt Score Error Units
testEncodeUTF8 1024 1024 3
avgt 25 8.573 ? 0.023 ms/op
testEncodeUTF8:?gc.alloc.rate 1024 1024 3
avgt 25 1202.896 ? 3.225 MB/sec
testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 3
avgt 25 11355727.904 ? 196.249 B/op
testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 3
avgt 25 1203.785 ? 6.240 MB/sec
testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 3
avgt 25 11364143.637 ? 52830.222 B/op
testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 3
avgt 25 0.008 ? 0.001 MB/sec
testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 3
avgt 25 77.088 ? 9.303 B/op
testEncodeUTF8:?gc.count 1024 1024 3
avgt 25 1973.000 counts
testEncodeUTF8:?gc.time 1024 1024 3
avgt 25 996.000 ms
enc
(count) (maxLength) (mySeed)
Mode Cnt Score Error Units
testEncodeUTF8 1024 1024 3
avgt 25 7.931 ? 0.006 ms/op
testEncodeUTF8:?gc.alloc.rate 1024 1024 3
avgt 25 965.347 ? 0.736 MB/sec
testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 3
avgt 25 8430590.163 ? 7.213 B/op
testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 3
avgt 25 966.373 ? 5.248 MB/sec
testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 3
avgt 25 8439563.689 ? 47282.178 B/op
testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 3
avgt 25 0.007 ? 0.001 MB/sec
testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 3
avgt 25 60.949 ? 8.405 B/op
testEncodeUTF8:?gc.count 1024 1024 3
avgt 25 1715.000 counts
testEncodeUTF8:?gc.time 1024 1024 3
avgt 25 888.000 ms
stringBuilder
(count) (maxLength) (mySeed)
Mode Cnt Score Error Units
testEncodeUTF8 1024 1024 3
avgt 25 8.115 ? 0.110 ms/op
testEncodeUTF8:?gc.alloc.rate 1024 1024 3
avgt 25 1259.267 ? 16.716 MB/sec
testEncodeUTF8:?gc.alloc.rate.norm 1024 1024 3
avgt 25 11249391.875 ? 6.552 B/op
testEncodeUTF8:?gc.churn.G1_Eden_Space 1024 1024 3
avgt 25 1259.937 ? 17.232 MB/sec
testEncodeUTF8:?gc.churn.G1_Eden_Space.norm 1024 1024 3
avgt 25 11255413.875 ? 43636.143 B/op
testEncodeUTF8:?gc.churn.G1_Survivor_Space 1024 1024 3
avgt 25 0.007 ? 0.001 MB/sec
testEncodeUTF8:?gc.churn.G1_Survivor_Space.norm 1024 1024 3
avgt 25 59.461 ? 9.087 B/op
testEncodeUTF8:?gc.count 1024 1024 3
avgt 25 2236.000 counts
testEncodeUTF8:?gc.time 1024 1024 3
avgt 25 1089.000 ms
The reason seems to be single char `StringBuilder.append()` that apart from
range check does encoding check and stores `char` as two bytes in `byte[]` in
ASB
-------------
PR: https://git.openjdk.java.net/jdk/pull/1598