[ 
https://issues.apache.org/jira/browse/FLINK-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986880#comment-16986880
 ] 

Roman Grebennikov commented on FLINK-14346:
-------------------------------------------

[~arvid heise] I didn't manage how to generate the avro schemas, so I've 
disabled these tests, as they should not be affected. But anyway, here is 
updated benchmarks results coming out of this PR 
([https://github.com/apache/flink/pull/10358),] with avro included:
{noformat}
master:

Benchmark                                                        Mode  Cnt    
Score    Error   Units
SerializationFrameworkMiniBenchmarks.serializerAvro             thrpt   30  
388.350 ±  5.574  ops/ms
SerializationFrameworkMiniBenchmarks.serializerKryo             thrpt   30  
211.344 ±  8.336  ops/ms
SerializationFrameworkMiniBenchmarks.serializerPojo             thrpt   30  
470.016 ± 13.141  ops/ms
SerializationFrameworkMiniBenchmarks.serializerRow              thrpt   30  
557.009 ±  9.751  ops/ms
SerializationFrameworkMiniBenchmarks.serializerStringHeavyPojo  thrpt   30   
88.379 ±  1.292  ops/ms
SerializationFrameworkMiniBenchmarks.serializerTuple            thrpt   30  
592.778 ±  8.488  ops/ms

Benchmark                                 (lengthStr)   (type)   Mode  Cnt      
Score      Error   Units
PojoSerializationBenchmark.readAvro               N/A      N/A  thrpt   30    
598.640 ±   25.763  ops/ms
PojoSerializationBenchmark.readKryo               N/A      N/A  thrpt   30    
193.355 ±    6.963  ops/ms
PojoSerializationBenchmark.readPojo               N/A      N/A  thrpt   30    
620.239 ±    3.194  ops/ms
PojoSerializationBenchmark.writeAvro              N/A      N/A  thrpt   30    
654.290 ±    3.870  ops/ms
PojoSerializationBenchmark.writeKryo              N/A      N/A  thrpt   30    
608.389 ±   12.006  ops/ms
PojoSerializationBenchmark.writePojo              N/A      N/A  thrpt   30    
828.253 ±    6.037  ops/ms
StringSerializationBenchmark.stringRead             4    ascii  thrpt   30  
11445.245 ±   35.093  ops/ms
StringSerializationBenchmark.stringRead             4  russian  thrpt   30   
7115.556 ±   25.999  ops/ms
StringSerializationBenchmark.stringRead             4  chinese  thrpt   30   
5149.447 ±   30.320  ops/ms
StringSerializationBenchmark.stringRead            32    ascii  thrpt   30   
2154.990 ±    6.773  ops/ms
StringSerializationBenchmark.stringRead            32  russian  thrpt   30   
1126.236 ±    0.974  ops/ms
StringSerializationBenchmark.stringRead            32  chinese  thrpt   30    
772.899 ±    3.538  ops/ms
StringSerializationBenchmark.stringRead           256    ascii  thrpt   30    
285.788 ±    0.907  ops/ms
StringSerializationBenchmark.stringRead           256  russian  thrpt   30    
144.113 ±    0.793  ops/ms
StringSerializationBenchmark.stringRead           256  chinese  thrpt   30     
98.919 ±    0.718  ops/ms
StringSerializationBenchmark.stringWrite            4    ascii  thrpt   30  
19755.480 ±  113.023  ops/ms
StringSerializationBenchmark.stringWrite            4  russian  thrpt   30  
11731.759 ± 1329.529  ops/ms
StringSerializationBenchmark.stringWrite            4  chinese  thrpt   30  
11457.075 ±   64.132  ops/ms
StringSerializationBenchmark.stringWrite           32    ascii  thrpt   30   
3349.573 ±   15.093  ops/ms
StringSerializationBenchmark.stringWrite           32  russian  thrpt   30   
1464.489 ±   10.258  ops/ms
StringSerializationBenchmark.stringWrite           32  chinese  thrpt   30   
1094.098 ±    4.450  ops/ms
StringSerializationBenchmark.stringWrite          256    ascii  thrpt   30    
464.168 ±    4.761  ops/ms
StringSerializationBenchmark.stringWrite          256  russian  thrpt   30    
269.960 ±   53.424  ops/ms
StringSerializationBenchmark.stringWrite          256  chinese  thrpt   30    
189.702 ±   36.327  ops/ms

this PR:Benchmark                                                        Mode  
Cnt    Score    Error   Units
SerializationFrameworkMiniBenchmarks.serializerAvro             thrpt   30  
389.392 ±  6.379  ops/ms
SerializationFrameworkMiniBenchmarks.serializerKryo             thrpt   30  
217.490 ±  8.975  ops/ms
SerializationFrameworkMiniBenchmarks.serializerPojo             thrpt   30  
448.449 ± 11.446  ops/ms
SerializationFrameworkMiniBenchmarks.serializerRow              thrpt   30  
521.921 ± 11.082  ops/ms
SerializationFrameworkMiniBenchmarks.serializerStringHeavyPojo  thrpt   30  
108.779 ±  2.980  ops/ms
SerializationFrameworkMiniBenchmarks.serializerTuple            thrpt   30  
548.718 ± 11.773  ops/ms

Benchmark                                 (lengthStr)   (type)   Mode  Cnt      
Score     Error   Units
PojoSerializationBenchmark.readAvro               N/A      N/A  thrpt   30    
593.101 ±  30.778  ops/ms
PojoSerializationBenchmark.readKryo               N/A      N/A  thrpt   30    
184.984 ±   2.437  ops/ms
PojoSerializationBenchmark.readPojo               N/A      N/A  thrpt   30    
657.618 ±   8.342  ops/ms
PojoSerializationBenchmark.writeAvro              N/A      N/A  thrpt   30    
632.636 ±   4.231  ops/ms
PojoSerializationBenchmark.writeKryo              N/A      N/A  thrpt   30    
609.889 ±   4.084  ops/ms
PojoSerializationBenchmark.writePojo              N/A      N/A  thrpt   30    
769.924 ±   8.650  ops/ms
StringSerializationBenchmark.stringRead             4    ascii  thrpt   30  
17623.353 ±  48.387  ops/ms
StringSerializationBenchmark.stringRead             4  russian  thrpt   30  
10226.762 ±  94.515  ops/ms
StringSerializationBenchmark.stringRead             4  chinese  thrpt   30   
7979.150 ±  61.660  ops/ms
StringSerializationBenchmark.stringRead            32    ascii  thrpt   30  
13919.065 ±  51.691  ops/ms
StringSerializationBenchmark.stringRead            32  russian  thrpt   30   
4537.817 ±  30.646  ops/ms
StringSerializationBenchmark.stringRead            32  chinese  thrpt   30   
3263.699 ±  22.664  ops/ms
StringSerializationBenchmark.stringRead           256    ascii  thrpt   30   
3183.622 ±  26.376  ops/ms
StringSerializationBenchmark.stringRead           256  russian  thrpt   30   
1011.096 ±  12.115  ops/ms
StringSerializationBenchmark.stringRead           256  chinese  thrpt   30    
689.678 ±   4.445  ops/ms
StringSerializationBenchmark.stringWrite            4    ascii  thrpt   30  
17796.026 ± 143.503  ops/ms
StringSerializationBenchmark.stringWrite            4  russian  thrpt   30  
16582.541 ± 372.612  ops/ms
StringSerializationBenchmark.stringWrite            4  chinese  thrpt   30  
15225.444 ± 119.326  ops/ms
StringSerializationBenchmark.stringWrite           32    ascii  thrpt   30   
9781.345 ± 826.800  ops/ms
StringSerializationBenchmark.stringWrite           32  russian  thrpt   30   
8423.629 ±  58.593  ops/ms
StringSerializationBenchmark.stringWrite           32  chinese  thrpt   30   
6111.879 ±  37.015  ops/ms
StringSerializationBenchmark.stringWrite          256    ascii  thrpt   30   
3620.902 ±  16.969  ops/ms
StringSerializationBenchmark.stringWrite          256  russian  thrpt   30   
1801.506 ±  14.516  ops/ms
StringSerializationBenchmark.stringWrite          256  chinese  thrpt   30   
1019.450 ±   8.503  ops/ms
{noformat}
I see a slight performance degradation on writing small strings (I suppose that 
buffer allocation is taking too much time on small strings), so I guess I'll 
add a fallback code to a previous version of the writeString to make it better.

> Performance issue with StringSerializer
> ---------------------------------------
>
>                 Key: FLINK-14346
>                 URL: https://issues.apache.org/jira/browse/FLINK-14346
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 1.9.0, 1.10.0, 1.9.1
>         Environment: Tested on Flink 1.10.0-SNAPSHOT-20191129-034045-139, 
> adoptopenjdk 8u222.
>            Reporter: Roman Grebennikov
>            Priority: Major
>              Labels: performance, pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> While doing a performance profiling for our Flink state-heavy streaming job, 
> we found that quite  a significant amount of CPU time is spent inside 
> StringSerializer writing data to the underlying byte buffer. The hottest part 
> of the code is the StringValue.writeString function. And replacing the 
> default StringSerializer with the custom one (to just play with a baseline), 
> which is just calling DataOutput.writeUTF/readUTF surprisingly yielded to 
> almost 2x speedup for string serialization.
> As writeUTF and writeString have incompatible wire formats, replacing latter 
> with former is not a good idea in general as it may break 
> checkpoint/savepoint compatibility.
> We also did an early performance analysis of the root cause of this 
> performance issue, and the main reason of JDK's writeUTF being faster is that 
> it's code is not writing directly to output stream byte-by-byte, but instead 
> creating an underlying temporary byte buffer. This yields to a HotSpot almost 
> perfectly unrolling the main loop, which results in much better data 
> parallelism.
> I've tried to port the ideas from the JVM's implementation of writeUTF back 
> to StringValue.writeString, and my current result is nice, having quite 
> significant speedup compared to the current implementation:
> {{[info] Benchmark Mode Cnt Score Error Units}}
> {{[info] StringSerializerBenchmark.measureJDK avgt 30 82.871 ± 1.293 ns/op}}
> {{[info] StringSerializerBenchmark.measureNew avgt 30 94.004 ± 1.491 ns/op}}
> {{[info] StringSerializerBenchmark.measureOld avgt 30 156.905 ± 3.596 ns/op}}
>  
> {{Where measureJDK is the JDK's writeUTF asa baseline, measureOld is the 
> current upstream implementation in Flink, and the measureNew is the improved 
> one. }}
>  
> {{The code for the benchmark (and the improved version of the serializer) is 
> here: [https://github.com/shuttie/flink-string-serializer]}}
>  
> {{Next steps:}}
>  # {{More benchmarks for non-ascii strings.}}
>  # {{Benchmarks for long strings.}}
>  # {{Benchmarks for deserialization.}}
>  # {{Tests for old-new wire format compatibility.}}
>  # {{PR to the Flink codebase.}}
> {{Is there an interest for this kind of performance improvement?}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to