[
https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986958#comment-16986958
]
Roman Grebennikov commented on FLINK-14346:
-------------------------------------------
I've also updated the PR to fix a performance regression on small string
serialization by adding unbuffered fallback. So for strings shorter that 6
characters, the code has either the same speed, or slightly faster.
{noformat}
Before fallback:
[info] Benchmark (length) (stringType)
Mode Cnt Score Error Units
[info] StringSerializerBenchmark.serializeDefault 1 ascii
avgt 5 33.383 ± 2.796 ns/op
[info] StringSerializerBenchmark.serializeDefault 2 ascii
avgt 5 32.731 ± 2.470 ns/op
[info] StringSerializerBenchmark.serializeDefault 3 ascii
avgt 5 37.619 ± 3.950 ns/op
[info] StringSerializerBenchmark.serializeDefault 4 ascii
avgt 5 42.452 ± 3.703 ns/op
[info] StringSerializerBenchmark.serializeDefault 5 ascii
avgt 5 46.887 ± 2.906 ns/op
[info] StringSerializerBenchmark.serializeDefault 6 ascii
avgt 5 57.461 ± 14.265 ns/op
[info] StringSerializerBenchmark.serializeDefault 7 ascii
avgt 5 58.337 ± 2.813 ns/op
[info] StringSerializerBenchmark.serializeImproved 1 ascii
avgt 5 37.015 ± 11.327 ns/op
[info] StringSerializerBenchmark.serializeImproved 2 ascii
avgt 5 40.723 ± 9.182 ns/op
[info] StringSerializerBenchmark.serializeImproved 3 ascii
avgt 5 43.556 ± 10.250 ns/op
[info] StringSerializerBenchmark.serializeImproved 4 ascii
avgt 5 48.410 ± 12.323 ns/op
[info] StringSerializerBenchmark.serializeImproved 5 ascii
avgt 5 47.770 ± 7.285 ns/op
[info] StringSerializerBenchmark.serializeImproved 6 ascii
avgt 5 48.477 ± 7.607 ns/op
[info] StringSerializerBenchmark.serializeImproved 7 ascii
avgt 5 49.082 ± 13.026 ns/op
After fallback:
[info] Benchmark (length) (stringType)
Mode Cnt Score Error Units
[info] StringSerializerBenchmark.serializeImproved 1 ascii
avgt 5 31.794 ± 0.898 ns/op
[info] StringSerializerBenchmark.serializeImproved 2 ascii
avgt 5 30.904 ± 0.814 ns/op
[info] StringSerializerBenchmark.serializeImproved 3 ascii
avgt 5 35.260 ± 1.481 ns/op
[info] StringSerializerBenchmark.serializeImproved 4 ascii
avgt 5 40.210 ± 1.505 ns/op
[info] StringSerializerBenchmark.serializeImproved 5 ascii
avgt 5 45.301 ± 2.434 ns/op
[info] StringSerializerBenchmark.serializeImproved 6 ascii
avgt 5 43.255 ± 8.550 ns/op
[info] StringSerializerBenchmark.serializeImproved 7 ascii
avgt 5 45.846 ± 7.652 ns/op{noformat}
> Performance issue with StringSerializer
> ---------------------------------------
>
> Key: FLINK-14346
> URL: https://issues.apache.org/jira/browse/FLINK-14346
> Project: Flink
> Issue Type: Improvement
> Components: API / Type Serialization System, Benchmarks
> Affects Versions: 1.9.0, 1.10.0, 1.9.1
> Environment: Tested on Flink 1.10.0-SNAPSHOT-20191129-034045-139,
> adoptopenjdk 8u222.
> Reporter: Roman Grebennikov
> Priority: Major
> Labels: performance, pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> While doing a performance profiling for our Flink state-heavy streaming job,
> we found that quite a significant amount of CPU time is spent inside
> StringSerializer writing data to the underlying byte buffer. The hottest part
> of the code is the StringValue.writeString function. And replacing the
> default StringSerializer with the custom one (to just play with a baseline),
> which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to
> almost 2x speedup for string serialization.
> As writeUTF and writeString have incompatible wire formats, replacing latter
> with former is not a good idea in general as it may break
> checkpoint/savepoint compatibility.
> We also did an early performance analysis of the root cause of this
> performance issue, and the main reason of JDK's writeUTF being faster is that
> it's code is not writing directly to output stream byte-by-byte, but instead
> creating an underlying temporary byte buffer. This yields to a HotSpot almost
> perfectly unrolling the main loop, which results in much better data
> parallelism.
> I've tried to port the ideas from the JVM's implementation of writeUTF back
> to StringValue.writeString, and my current result is nice, having quite
> significant speedup compared to the current implementation:
> {{[info] Benchmark Mode Cnt Score Error Units}}
> {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
> {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
> {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}
>
> {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the
> current upstream implementation in Flink, and the measureNew is the improved
> one. }}
>
> {{The code for the benchmark (and the improved version of the serializer) is
> here: [https://github.com/shuttie/flink-string-serializer]}}
>
> {{Next steps:}}
> # {{More benchmarks for non-ascii strings.}}
> # {{Benchmarks for long strings.}}
> # {{Benchmarks for deserialization.}}
> # {{Tests for old-new wire format compatibility.}}
> # {{PR to the Flink codebase.}}
> {{Is there an interest for this kind of performance improvement?}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)