[
https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986880#comment-16986880
]
Roman Grebennikov commented on FLINK-14346:
-------------------------------------------
[~arvid heise] I didn't manage how to generate the avro schemas, so I've
disabled these tests, as they should not be affected. But anyway, here is
updated benchmarks results coming out of this PR
([https://github.com/apache/flink/pull/10358),] with avro included:
{noformat}
master:
Benchmark Mode Cnt
Score Error Units
SerializationFrameworkMiniBenchmarks.serializerAvro thrpt 30
388.350 ± 5.574 ops/ms
SerializationFrameworkMiniBenchmarks.serializerKryo thrpt 30
211.344 ± 8.336 ops/ms
SerializationFrameworkMiniBenchmarks.serializerPojo thrpt 30
470.016 ± 13.141 ops/ms
SerializationFrameworkMiniBenchmarks.serializerRow thrpt 30
557.009 ± 9.751 ops/ms
SerializationFrameworkMiniBenchmarks.serializerStringHeavyPojo thrpt 30
88.379 ± 1.292 ops/ms
SerializationFrameworkMiniBenchmarks.serializerTuple thrpt 30
592.778 ± 8.488 ops/ms
Benchmark (lengthStr) (type) Mode Cnt
Score Error Units
PojoSerializationBenchmark.readAvro N/A N/A thrpt 30
598.640 ± 25.763 ops/ms
PojoSerializationBenchmark.readKryo N/A N/A thrpt 30
193.355 ± 6.963 ops/ms
PojoSerializationBenchmark.readPojo N/A N/A thrpt 30
620.239 ± 3.194 ops/ms
PojoSerializationBenchmark.writeAvro N/A N/A thrpt 30
654.290 ± 3.870 ops/ms
PojoSerializationBenchmark.writeKryo N/A N/A thrpt 30
608.389 ± 12.006 ops/ms
PojoSerializationBenchmark.writePojo N/A N/A thrpt 30
828.253 ± 6.037 ops/ms
StringSerializationBenchmark.stringRead 4 ascii thrpt 30
11445.245 ± 35.093 ops/ms
StringSerializationBenchmark.stringRead 4 russian thrpt 30
7115.556 ± 25.999 ops/ms
StringSerializationBenchmark.stringRead 4 chinese thrpt 30
5149.447 ± 30.320 ops/ms
StringSerializationBenchmark.stringRead 32 ascii thrpt 30
2154.990 ± 6.773 ops/ms
StringSerializationBenchmark.stringRead 32 russian thrpt 30
1126.236 ± 0.974 ops/ms
StringSerializationBenchmark.stringRead 32 chinese thrpt 30
772.899 ± 3.538 ops/ms
StringSerializationBenchmark.stringRead 256 ascii thrpt 30
285.788 ± 0.907 ops/ms
StringSerializationBenchmark.stringRead 256 russian thrpt 30
144.113 ± 0.793 ops/ms
StringSerializationBenchmark.stringRead 256 chinese thrpt 30
98.919 ± 0.718 ops/ms
StringSerializationBenchmark.stringWrite 4 ascii thrpt 30
19755.480 ± 113.023 ops/ms
StringSerializationBenchmark.stringWrite 4 russian thrpt 30
11731.759 ± 1329.529 ops/ms
StringSerializationBenchmark.stringWrite 4 chinese thrpt 30
11457.075 ± 64.132 ops/ms
StringSerializationBenchmark.stringWrite 32 ascii thrpt 30
3349.573 ± 15.093 ops/ms
StringSerializationBenchmark.stringWrite 32 russian thrpt 30
1464.489 ± 10.258 ops/ms
StringSerializationBenchmark.stringWrite 32 chinese thrpt 30
1094.098 ± 4.450 ops/ms
StringSerializationBenchmark.stringWrite 256 ascii thrpt 30
464.168 ± 4.761 ops/ms
StringSerializationBenchmark.stringWrite 256 russian thrpt 30
269.960 ± 53.424 ops/ms
StringSerializationBenchmark.stringWrite 256 chinese thrpt 30
189.702 ± 36.327 ops/ms
this PR:Benchmark Mode
Cnt Score Error Units
SerializationFrameworkMiniBenchmarks.serializerAvro thrpt 30
389.392 ± 6.379 ops/ms
SerializationFrameworkMiniBenchmarks.serializerKryo thrpt 30
217.490 ± 8.975 ops/ms
SerializationFrameworkMiniBenchmarks.serializerPojo thrpt 30
448.449 ± 11.446 ops/ms
SerializationFrameworkMiniBenchmarks.serializerRow thrpt 30
521.921 ± 11.082 ops/ms
SerializationFrameworkMiniBenchmarks.serializerStringHeavyPojo thrpt 30
108.779 ± 2.980 ops/ms
SerializationFrameworkMiniBenchmarks.serializerTuple thrpt 30
548.718 ± 11.773 ops/ms
Benchmark (lengthStr) (type) Mode Cnt
Score Error Units
PojoSerializationBenchmark.readAvro N/A N/A thrpt 30
593.101 ± 30.778 ops/ms
PojoSerializationBenchmark.readKryo N/A N/A thrpt 30
184.984 ± 2.437 ops/ms
PojoSerializationBenchmark.readPojo N/A N/A thrpt 30
657.618 ± 8.342 ops/ms
PojoSerializationBenchmark.writeAvro N/A N/A thrpt 30
632.636 ± 4.231 ops/ms
PojoSerializationBenchmark.writeKryo N/A N/A thrpt 30
609.889 ± 4.084 ops/ms
PojoSerializationBenchmark.writePojo N/A N/A thrpt 30
769.924 ± 8.650 ops/ms
StringSerializationBenchmark.stringRead 4 ascii thrpt 30
17623.353 ± 48.387 ops/ms
StringSerializationBenchmark.stringRead 4 russian thrpt 30
10226.762 ± 94.515 ops/ms
StringSerializationBenchmark.stringRead 4 chinese thrpt 30
7979.150 ± 61.660 ops/ms
StringSerializationBenchmark.stringRead 32 ascii thrpt 30
13919.065 ± 51.691 ops/ms
StringSerializationBenchmark.stringRead 32 russian thrpt 30
4537.817 ± 30.646 ops/ms
StringSerializationBenchmark.stringRead 32 chinese thrpt 30
3263.699 ± 22.664 ops/ms
StringSerializationBenchmark.stringRead 256 ascii thrpt 30
3183.622 ± 26.376 ops/ms
StringSerializationBenchmark.stringRead 256 russian thrpt 30
1011.096 ± 12.115 ops/ms
StringSerializationBenchmark.stringRead 256 chinese thrpt 30
689.678 ± 4.445 ops/ms
StringSerializationBenchmark.stringWrite 4 ascii thrpt 30
17796.026 ± 143.503 ops/ms
StringSerializationBenchmark.stringWrite 4 russian thrpt 30
16582.541 ± 372.612 ops/ms
StringSerializationBenchmark.stringWrite 4 chinese thrpt 30
15225.444 ± 119.326 ops/ms
StringSerializationBenchmark.stringWrite 32 ascii thrpt 30
9781.345 ± 826.800 ops/ms
StringSerializationBenchmark.stringWrite 32 russian thrpt 30
8423.629 ± 58.593 ops/ms
StringSerializationBenchmark.stringWrite 32 chinese thrpt 30
6111.879 ± 37.015 ops/ms
StringSerializationBenchmark.stringWrite 256 ascii thrpt 30
3620.902 ± 16.969 ops/ms
StringSerializationBenchmark.stringWrite 256 russian thrpt 30
1801.506 ± 14.516 ops/ms
StringSerializationBenchmark.stringWrite 256 chinese thrpt 30
1019.450 ± 8.503 ops/ms
{noformat}
I see a slight performance degradation on writing small strings (I suppose that
buffer allocation is taking too much time on small strings), so I guess I'll
add a fallback code to a previous version of the writeString to make it better.
> Performance issue with StringSerializer
> ---------------------------------------
>
> Key: FLINK-14346
> URL: https://issues.apache.org/jira/browse/FLINK-14346
> Project: Flink
> Issue Type: Improvement
> Components: API / Type Serialization System, Benchmarks
> Affects Versions: 1.9.0, 1.10.0, 1.9.1
> Environment: Tested on Flink 1.10.0-SNAPSHOT-20191129-034045-139,
> adoptopenjdk 8u222.
> Reporter: Roman Grebennikov
> Priority: Major
> Labels: performance, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> While doing a performance profiling for our Flink state-heavy streaming job,
> we found that quite a significant amount of CPU time is spent inside
> StringSerializer writing data to the underlying byte buffer. The hottest part
> of the code is the StringValue.writeString function. And replacing the
> default StringSerializer with the custom one (to just play with a baseline),
> which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to
> almost 2x speedup for string serialization.
> As writeUTF and writeString have incompatible wire formats, replacing latter
> with former is not a good idea in general as it may break
> checkpoint/savepoint compatibility.
> We also did an early performance analysis of the root cause of this
> performance issue, and the main reason of JDK's writeUTF being faster is that
> it's code is not writing directly to output stream byte-by-byte, but instead
> creating an underlying temporary byte buffer. This yields to a HotSpot almost
> perfectly unrolling the main loop, which results in much better data
> parallelism.
> I've tried to port the ideas from the JVM's implementation of writeUTF back
> to StringValue.writeString, and my current result is nice, having quite
> significant speedup compared to the current implementation:
> {{[info] Benchmark Mode Cnt Score Error Units}}
> {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
> {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
> {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}
>
> {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the
> current upstream implementation in Flink, and the measureNew is the improved
> one. }}
>
> {{The code for the benchmark (and the improved version of the serializer) is
> here: [https://github.com/shuttie/flink-string-serializer]}}
>
> {{Next steps:}}
> # {{More benchmarks for non-ascii strings.}}
> # {{Benchmarks for long strings.}}
> # {{Benchmarks for deserialization.}}
> # {{Tests for old-new wire format compatibility.}}
> # {{PR to the Flink codebase.}}
> {{Is there an interest for this kind of performance improvement?}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)