[ 
https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984917#comment-16984917
 ] 

Roman Grebennikov commented on FLINK-14346:
-------------------------------------------

I've made a proper PR to port these performance optimizations from the PoC to 
Flink codebase. So it would be really nice if someone will review them 
eventually.

The motivation behind this change is listed in the PR description, but the most 
important benchmark results looking quite good:
{noformat}
[info]  Benchmark                             (length)  (stringType)    Mode    
Cnt     Score       Error       Units
[info]  StringDeserializerBenchmark.deserializeDefault  1       ascii   avgt    
25      46.603  ±       0.750   ns/op
[info]  StringDeserializerBenchmark.deserializeImproved 1       ascii   avgt    
25      51.074  ±       0.720   ns/op
[info]  StringDeserializerBenchmark.deserializeJDK      1       ascii   avgt    
25      63.402  ±       1.631   ns/op
[info]  StringSerializerBenchmark.serializeDefault      1       ascii   avgt    
25      31.595  ±       0.489   ns/op
[info]  StringSerializerBenchmark.serializeImproved     1       ascii   avgt    
25      33.454  ±       0.151   ns/op
[info]  StringSerializerBenchmark.serializeJDK          1       ascii   avgt    
25      34.721  ±       0.128   ns/op

[info]  StringDeserializerBenchmark.deserializeDefault  16      ascii   avgt    
25      251.321 ±       3.251   ns/op
[info]  StringDeserializerBenchmark.deserializeImproved 16      ascii   avgt    
25      55.385  ±       1.176   ns/op
[info]  StringDeserializerBenchmark.deserializeJDK      16      ascii   avgt    
25      77.147  ±       1.661   ns/op
[info]  StringSerializerBenchmark.serializeDefault      16      ascii   avgt    
25      95.782  ±       0.261   ns/op
[info]  StringSerializerBenchmark.serializeImproved     16      ascii   avgt    
25      51.806  ±       0.180   ns/op
[info]  StringSerializerBenchmark.serializeJDK          16      ascii   avgt    
25      50.786  ±       1.677   ns/op

[info]  StringDeserializerBenchmark.deserializeDefault  128     ascii   avgt    
25      1757.726 ±      3.312   ns/op
[info]  StringDeserializerBenchmark.deserializeImproved 128     ascii   avgt    
25      140.374 ±       1.006   ns/op
[info]  StringDeserializerBenchmark.deserializeJDK      128     ascii   avgt    
25      263.445 ±       6.912   ns/op
[info]  StringSerializerBenchmark.serializeDefault      128     ascii   avgt    
25      670.627 ±       2.807   ns/op
[info]  StringSerializerBenchmark.serializeImproved     128     ascii   avgt    
25      161.481 ±       2.798   ns/op
[info]  StringSerializerBenchmark.serializeJDK          128     ascii   avgt    
25      151.789 ±       7.295   ns/op
{noformat}

> Performance issue with StringSerializer
> ---------------------------------------
>
>                 Key: FLINK-14346
>                 URL: https://issues.apache.org/jira/browse/FLINK-14346
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 1.9.0
>         Environment: Tested on Flink 1.9.0, adoptopenjdk 8u222.
>            Reporter: Roman Grebennikov
>            Priority: Major
>              Labels: performance, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While doing a performance profiling for our Flink state-heavy streaming job, 
> we found that quite  a significant amount of CPU time is spent inside 
> StringSerializer writing data to the underlying byte buffer. The hottest part 
> of the code is the StringValue.writeString function. And replacing the 
> default StringSerializer with the custom one (to just play with a baseline), 
> which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to 
> almost 2x speedup for string serialization.
> As writeUTF and writeString have incompatible wire formats, replacing latter 
> with former is not a good idea in general as it may break 
> checkpoint/savepoint compatibility.
> We also did an early performance analysis of the root cause of this 
> performance issue, and the main reason of JDK's writeUTF being faster is that 
> it's code is not writing directly to output stream byte-by-byte, but instead 
> creating an underlying temporary byte buffer. This yields to a HotSpot almost 
> perfectly unrolling the main loop, which results in much better data 
> parallelism.
> I've tried to port the ideas from the JVM's implementation of writeUTF back 
> to StringValue.writeString, and my current result is nice, having quite 
> significant speedup compared to the current implementation:
> {{[info] Benchmark Mode Cnt Score Error Units}}
> {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
> {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
> {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}
>  
> {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the 
> current upstream implementation in Flink, and the measureNew is the improved 
> one. }}
>  
> {{The code for the benchmark (and the improved version of the serializer) is 
> here: [https://github.com/shuttie/flink-string-serializer]}}
>  
> {{Next steps:}}
>  # {{More benchmarks for non-ascii strings.}}
>  # {{Benchmarks for long strings.}}
>  # {{Benchmarks for deserialization.}}
>  # {{Tests for old-new wire format compatibility.}}
>  # {{PR to the Flink codebase.}}
> {{Is there an interest for this kind of performance improvement?}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to