[ 
https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987643#comment-16987643
 ] 

Roman Grebennikov commented on FLINK-14346:
-------------------------------------------

Benchmarks for larger strings are also constantly outperform the original 
implementation:

 
{noformat}
master:

Benchmark                                 (lengthStr)   (type)   Mode  Cnt    
Score     Error   Units
StringSerializationBenchmark.stringRead          1024    ascii  thrpt   30  
769.067 ±   9.803  ops/ms
StringSerializationBenchmark.stringRead          1024  russian  thrpt   30  
293.632 ±  22.269  ops/ms
StringSerializationBenchmark.stringRead          1024  chinese  thrpt   30  
260.280 ±   0.768  ops/ms
StringSerializationBenchmark.stringRead          4096    ascii  thrpt   30  
144.826 ±  21.883  ops/ms
StringSerializationBenchmark.stringRead          4096  russian  thrpt   30   
74.815 ±   1.635  ops/ms
StringSerializationBenchmark.stringRead          4096  chinese  thrpt   30   
67.306 ±   2.223  ops/ms
StringSerializationBenchmark.stringRead         16384    ascii  thrpt   30   
53.418 ±   0.589  ops/ms
StringSerializationBenchmark.stringRead         16384  russian  thrpt   30   
20.338 ±   0.374  ops/ms
StringSerializationBenchmark.stringRead         16384  chinese  thrpt   30   
17.313 ±   0.126  ops/ms
StringSerializationBenchmark.stringRead         65536    ascii  thrpt   30   
10.042 ±   1.524  ops/ms
StringSerializationBenchmark.stringRead         65536  russian  thrpt   30    
5.055 ±   0.018  ops/ms
StringSerializationBenchmark.stringRead         65536  chinese  thrpt   30    
4.342 ±   0.037  ops/ms
StringSerializationBenchmark.stringWrite         1024    ascii  thrpt   30  
771.981 ± 160.013  ops/ms
StringSerializationBenchmark.stringWrite         1024  russian  thrpt   30  
456.973 ±   1.563  ops/ms
StringSerializationBenchmark.stringWrite         1024  chinese  thrpt   30  
250.321 ±   0.953  ops/ms
StringSerializationBenchmark.stringWrite         4096    ascii  thrpt   30  
106.595 ±   0.496  ops/ms
StringSerializationBenchmark.stringWrite         4096  russian  thrpt   30   
70.336 ±   0.157  ops/ms
StringSerializationBenchmark.stringWrite         4096  chinese  thrpt   30   
49.363 ±   0.236  ops/ms
StringSerializationBenchmark.stringWrite        16384    ascii  thrpt   30   
26.593 ±   0.099  ops/ms
StringSerializationBenchmark.stringWrite        16384  russian  thrpt   30   
17.362 ±   0.077  ops/ms
StringSerializationBenchmark.stringWrite        16384  chinese  thrpt   30   
13.487 ±   1.534  ops/ms
StringSerializationBenchmark.stringWrite        65536    ascii  thrpt   30   
11.295 ±   2.286  ops/ms
StringSerializationBenchmark.stringWrite        65536  russian  thrpt   30    
5.805 ±   0.753  ops/ms
StringSerializationBenchmark.stringWrite        65536  chinese  thrpt   30    
3.707 ±   0.326  ops/ms

this PR:

Benchmark                                 (lengthStr)   (type)   Mode  Cnt    
Score   Error   Units
StringSerializationBenchmark.stringRead          1024    ascii  thrpt   30   
70.249 ± 0.458  ops/ms
StringSerializationBenchmark.stringRead          1024  russian  thrpt   30   
36.628 ± 0.091  ops/ms
StringSerializationBenchmark.stringRead          1024  chinese  thrpt   30   
24.181 ± 0.094  ops/ms
StringSerializationBenchmark.stringRead          4096    ascii  thrpt   30   
17.698 ± 0.313  ops/ms
StringSerializationBenchmark.stringRead          4096  russian  thrpt   30    
9.086 ± 0.064  ops/ms
StringSerializationBenchmark.stringRead          4096  chinese  thrpt   30    
6.048 ± 0.024  ops/ms
StringSerializationBenchmark.stringRead         16384    ascii  thrpt   30    
4.382 ± 0.024  ops/ms
StringSerializationBenchmark.stringRead         16384  russian  thrpt   30    
2.270 ± 0.008  ops/ms
StringSerializationBenchmark.stringRead         16384  chinese  thrpt   30    
1.515 ± 0.007  ops/ms
StringSerializationBenchmark.stringRead         65536    ascii  thrpt   30    
1.109 ± 0.005  ops/ms
StringSerializationBenchmark.stringRead         65536  russian  thrpt   30    
0.567 ± 0.002  ops/ms
StringSerializationBenchmark.stringRead         65536  chinese  thrpt   30    
0.379 ± 0.002  ops/ms
StringSerializationBenchmark.stringWrite         1024    ascii  thrpt   30  
175.745 ± 1.416  ops/ms
StringSerializationBenchmark.stringWrite         1024  russian  thrpt   30   
52.724 ± 0.231  ops/ms
StringSerializationBenchmark.stringWrite         1024  chinese  thrpt   30   
45.952 ± 5.209  ops/ms
StringSerializationBenchmark.stringWrite         4096    ascii  thrpt   30   
42.445 ± 0.288  ops/ms
StringSerializationBenchmark.stringWrite         4096  russian  thrpt   30   
22.000 ± 0.320  ops/ms
StringSerializationBenchmark.stringWrite         4096  chinese  thrpt   30   
13.603 ± 1.681  ops/ms
StringSerializationBenchmark.stringWrite        16384    ascii  thrpt   30    
7.062 ± 0.042  ops/ms
StringSerializationBenchmark.stringWrite        16384  russian  thrpt   30    
3.532 ± 0.022  ops/ms
StringSerializationBenchmark.stringWrite        16384  chinese  thrpt   30    
2.527 ± 0.015  ops/ms
StringSerializationBenchmark.stringWrite        65536    ascii  thrpt   30    
1.741 ± 0.007  ops/ms
StringSerializationBenchmark.stringWrite        65536  russian  thrpt   30    
0.893 ± 0.002  ops/ms
StringSerializationBenchmark.stringWrite        65536  chinese  thrpt   30    
0.635 ± 0.004  ops/ms{noformat}

> Performance issue with StringSerializer
> ---------------------------------------
>
>                 Key: FLINK-14346
>                 URL: https://issues.apache.org/jira/browse/FLINK-14346
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 1.9.0, 1.10.0, 1.9.1
>         Environment: Tested on Flink 1.10.0-SNAPSHOT-20191129-034045-139, 
> adoptopenjdk 8u222.
>            Reporter: Roman Grebennikov
>            Priority: Major
>              Labels: performance, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While doing a performance profiling for our Flink state-heavy streaming job, 
> we found that quite  a significant amount of CPU time is spent inside 
> StringSerializer writing data to the underlying byte buffer. The hottest part 
> of the code is the StringValue.writeString function. And replacing the 
> default StringSerializer with the custom one (to just play with a baseline), 
> which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to 
> almost 2x speedup for string serialization.
> As writeUTF and writeString have incompatible wire formats, replacing latter 
> with former is not a good idea in general as it may break 
> checkpoint/savepoint compatibility.
> We also did an early performance analysis of the root cause of this 
> performance issue, and the main reason of JDK's writeUTF being faster is that 
> it's code is not writing directly to output stream byte-by-byte, but instead 
> creating an underlying temporary byte buffer. This yields to a HotSpot almost 
> perfectly unrolling the main loop, which results in much better data 
> parallelism.
> I've tried to port the ideas from the JVM's implementation of writeUTF back 
> to StringValue.writeString, and my current result is nice, having quite 
> significant speedup compared to the current implementation:
> {{[info] Benchmark Mode Cnt Score Error Units}}
> {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
> {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
> {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}
>  
> {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the 
> current upstream implementation in Flink, and the measureNew is the improved 
> one. }}
>  
> {{The code for the benchmark (and the improved version of the serializer) is 
> here: [https://github.com/shuttie/flink-string-serializer]}}
>  
> {{Next steps:}}
>  # {{More benchmarks for non-ascii strings.}}
>  # {{Benchmarks for long strings.}}
>  # {{Benchmarks for deserialization.}}
>  # {{Tests for old-new wire format compatibility.}}
>  # {{PR to the Flink codebase.}}
> {{Is there an interest for this kind of performance improvement?}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to