[
https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987643#comment-16987643
]
Roman Grebennikov commented on FLINK-14346:
-------------------------------------------
Benchmarks for larger strings are also constantly outperform the original
implementation:
{noformat}
master:
Benchmark (lengthStr) (type) Mode Cnt
Score Error Units
StringSerializationBenchmark.stringRead 1024 ascii thrpt 30
769.067 ± 9.803 ops/ms
StringSerializationBenchmark.stringRead 1024 russian thrpt 30
293.632 ± 22.269 ops/ms
StringSerializationBenchmark.stringRead 1024 chinese thrpt 30
260.280 ± 0.768 ops/ms
StringSerializationBenchmark.stringRead 4096 ascii thrpt 30
144.826 ± 21.883 ops/ms
StringSerializationBenchmark.stringRead 4096 russian thrpt 30
74.815 ± 1.635 ops/ms
StringSerializationBenchmark.stringRead 4096 chinese thrpt 30
67.306 ± 2.223 ops/ms
StringSerializationBenchmark.stringRead 16384 ascii thrpt 30
53.418 ± 0.589 ops/ms
StringSerializationBenchmark.stringRead 16384 russian thrpt 30
20.338 ± 0.374 ops/ms
StringSerializationBenchmark.stringRead 16384 chinese thrpt 30
17.313 ± 0.126 ops/ms
StringSerializationBenchmark.stringRead 65536 ascii thrpt 30
10.042 ± 1.524 ops/ms
StringSerializationBenchmark.stringRead 65536 russian thrpt 30
5.055 ± 0.018 ops/ms
StringSerializationBenchmark.stringRead 65536 chinese thrpt 30
4.342 ± 0.037 ops/ms
StringSerializationBenchmark.stringWrite 1024 ascii thrpt 30
771.981 ± 160.013 ops/ms
StringSerializationBenchmark.stringWrite 1024 russian thrpt 30
456.973 ± 1.563 ops/ms
StringSerializationBenchmark.stringWrite 1024 chinese thrpt 30
250.321 ± 0.953 ops/ms
StringSerializationBenchmark.stringWrite 4096 ascii thrpt 30
106.595 ± 0.496 ops/ms
StringSerializationBenchmark.stringWrite 4096 russian thrpt 30
70.336 ± 0.157 ops/ms
StringSerializationBenchmark.stringWrite 4096 chinese thrpt 30
49.363 ± 0.236 ops/ms
StringSerializationBenchmark.stringWrite 16384 ascii thrpt 30
26.593 ± 0.099 ops/ms
StringSerializationBenchmark.stringWrite 16384 russian thrpt 30
17.362 ± 0.077 ops/ms
StringSerializationBenchmark.stringWrite 16384 chinese thrpt 30
13.487 ± 1.534 ops/ms
StringSerializationBenchmark.stringWrite 65536 ascii thrpt 30
11.295 ± 2.286 ops/ms
StringSerializationBenchmark.stringWrite 65536 russian thrpt 30
5.805 ± 0.753 ops/ms
StringSerializationBenchmark.stringWrite 65536 chinese thrpt 30
3.707 ± 0.326 ops/ms
this PR:
Benchmark (lengthStr) (type) Mode Cnt
Score Error Units
StringSerializationBenchmark.stringRead 1024 ascii thrpt 30
70.249 ± 0.458 ops/ms
StringSerializationBenchmark.stringRead 1024 russian thrpt 30
36.628 ± 0.091 ops/ms
StringSerializationBenchmark.stringRead 1024 chinese thrpt 30
24.181 ± 0.094 ops/ms
StringSerializationBenchmark.stringRead 4096 ascii thrpt 30
17.698 ± 0.313 ops/ms
StringSerializationBenchmark.stringRead 4096 russian thrpt 30
9.086 ± 0.064 ops/ms
StringSerializationBenchmark.stringRead 4096 chinese thrpt 30
6.048 ± 0.024 ops/ms
StringSerializationBenchmark.stringRead 16384 ascii thrpt 30
4.382 ± 0.024 ops/ms
StringSerializationBenchmark.stringRead 16384 russian thrpt 30
2.270 ± 0.008 ops/ms
StringSerializationBenchmark.stringRead 16384 chinese thrpt 30
1.515 ± 0.007 ops/ms
StringSerializationBenchmark.stringRead 65536 ascii thrpt 30
1.109 ± 0.005 ops/ms
StringSerializationBenchmark.stringRead 65536 russian thrpt 30
0.567 ± 0.002 ops/ms
StringSerializationBenchmark.stringRead 65536 chinese thrpt 30
0.379 ± 0.002 ops/ms
StringSerializationBenchmark.stringWrite 1024 ascii thrpt 30
175.745 ± 1.416 ops/ms
StringSerializationBenchmark.stringWrite 1024 russian thrpt 30
52.724 ± 0.231 ops/ms
StringSerializationBenchmark.stringWrite 1024 chinese thrpt 30
45.952 ± 5.209 ops/ms
StringSerializationBenchmark.stringWrite 4096 ascii thrpt 30
42.445 ± 0.288 ops/ms
StringSerializationBenchmark.stringWrite 4096 russian thrpt 30
22.000 ± 0.320 ops/ms
StringSerializationBenchmark.stringWrite 4096 chinese thrpt 30
13.603 ± 1.681 ops/ms
StringSerializationBenchmark.stringWrite 16384 ascii thrpt 30
7.062 ± 0.042 ops/ms
StringSerializationBenchmark.stringWrite 16384 russian thrpt 30
3.532 ± 0.022 ops/ms
StringSerializationBenchmark.stringWrite 16384 chinese thrpt 30
2.527 ± 0.015 ops/ms
StringSerializationBenchmark.stringWrite 65536 ascii thrpt 30
1.741 ± 0.007 ops/ms
StringSerializationBenchmark.stringWrite 65536 russian thrpt 30
0.893 ± 0.002 ops/ms
StringSerializationBenchmark.stringWrite 65536 chinese thrpt 30
0.635 ± 0.004 ops/ms{noformat}
> Performance issue with StringSerializer
> ---------------------------------------
>
> Key: FLINK-14346
> URL: https://issues.apache.org/jira/browse/FLINK-14346
> Project: Flink
> Issue Type: Improvement
> Components: API / Type Serialization System, Benchmarks
> Affects Versions: 1.9.0, 1.10.0, 1.9.1
> Environment: Tested on Flink 1.10.0-SNAPSHOT-20191129-034045-139,
> adoptopenjdk 8u222.
> Reporter: Roman Grebennikov
> Priority: Major
> Labels: performance, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> While doing a performance profiling for our Flink state-heavy streaming job,
> we found that quite a significant amount of CPU time is spent inside
> StringSerializer writing data to the underlying byte buffer. The hottest part
> of the code is the StringValue.writeString function. And replacing the
> default StringSerializer with the custom one (to just play with a baseline),
> which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to
> almost 2x speedup for string serialization.
> As writeUTF and writeString have incompatible wire formats, replacing latter
> with former is not a good idea in general as it may break
> checkpoint/savepoint compatibility.
> We also did an early performance analysis of the root cause of this
> performance issue, and the main reason of JDK's writeUTF being faster is that
> it's code is not writing directly to output stream byte-by-byte, but instead
> creating an underlying temporary byte buffer. This yields to a HotSpot almost
> perfectly unrolling the main loop, which results in much better data
> parallelism.
> I've tried to port the ideas from the JVM's implementation of writeUTF back
> to StringValue.writeString, and my current result is nice, having quite
> significant speedup compared to the current implementation:
> {{[info] Benchmark Mode Cnt Score Error Units}}
> {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
> {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
> {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}
>
> {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the
> current upstream implementation in Flink, and the measureNew is the improved
> one. }}
>
> {{The code for the benchmark (and the improved version of the serializer) is
> here: [https://github.com/shuttie/flink-string-serializer]}}
>
> {{Next steps:}}
> # {{More benchmarks for non-ascii strings.}}
> # {{Benchmarks for long strings.}}
> # {{Benchmarks for deserialization.}}
> # {{Tests for old-new wire format compatibility.}}
> # {{PR to the Flink codebase.}}
> {{Is there an interest for this kind of performance improvement?}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)