[ 
https://issues.apache.org/jira/browse/FLINK-37435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935021#comment-17935021
 ] 

Kurt Ostfeld commented on FLINK-37435:
--------------------------------------

I created this new benchmark in the flink-benchmarks project:
https://gist.github.com/kurtostfeld/1a6a6cf1a73d85f238fe0522be6f2d43

I'm not checking this in. This is just a single file you can drop-in to the 
source tree and run via:

```
mvn package
java -jar target/benchmarks.jar -rf csv 
"org.apache.flink.benchmark.full.KryoBenchmark"
```

It results in (using my laptop with Temurin openjdk 17 distribution):

Benchmark Mode Cnt Score Error Units
KryoBenchmark.readFlinkKryo5POJOBenchmark thrpt 25 537.134 ± 10.109 ops/ms
KryoBenchmark.readFlinkKryo5vBPOJOBenchmark thrpt 25 536.995 ± 10.380 ops/ms
KryoBenchmark.readFlinkKryo5vCPOJOBenchmark thrpt 25 544.460 ± 6.775 ops/ms
KryoBenchmark.readFlinkKryo5vDPOJOBenchmark thrpt 25 1340.848 ± 22.822 ops/ms
KryoBenchmark.readStreamKryo5POJOBenchmark thrpt 25 2277.788 ± 88.142 ops/ms
KryoBenchmark.readDirectKryo5POJOBenchmark thrpt 25 5649.602 ± 685.249 ops/ms

To explain the results, starting from the slowest benchmark that is mirroring 
PojoSerializationBenchmark.readKryo to the fastest benchmark:

537.134 ops/ms. KryoBenchmark.readFlinkKryo5POJOBenchmark
This is nearly identical to the official PojoSerializationBenchmark.readKryo 
benchmark.

536.995 ops/ms. KryoBenchmark.readFlinkKryo5vBPOJOBenchmark
This is an expanded for clarity version of the first benchmark with nearly 
identical benchmark results.

544.460 ops/ms. KryoBenchmark.readFlinkKryo5vCPOJOBenchmark
This verison removes unnecessary layers of InputStream wrappers. This provides 
negligible performance improvement.

1340.848 ops/ms. KryoBenchmark.readFlinkKryo5vDPOJOBenchmark
Switch from NoFetchingInput to Input. This gives a major performance 
improvement, but we can't do that in KryoSerializer as KryoSerializer needs 
"peek" functionality that NoFetchingInput provides.

2277.788 ops/ms. KryoBenchmark.readStreamKryo5POJOBenchmark
The only difference between this and the previous benchmark is that this uses a 
much simpler Kryo serializer configured for this benchmark and does not have 
the full suite of Flink Kryo serializer options registered. I'm quite surprised 
that this delivers such a large performance improvement.

5649.602 ops/ms. KryoBenchmark.readDirectKryo5POJOBenchmark
This is the fastest benchmark. This is like the last benchmark but this does 
Input -> byte[] where the previous benchmark does Input -> ByteArrayInputStream 
-> byte[].

To summarize, that's a ~10x performance difference from the way 
PojoSerializationBenchmark.readKryo works to a more optimized version caused by 
three changes:
1. NoFetchingInput -> Input.
2. Simple Kryo config vs complex Kryo config done by Flink KryoSerializer
3. Input -> byte[] instead of Input -> ByteArrayInputStream -> byte[]

None of these changes can easily be dropped into Flink without bigger 
architectural changes.

Why is the Kryo upgrade reducing performance? That's the original concern of 
this issue and I'm still not sure.

> Kryo related perf regression since March 5th
> --------------------------------------------
>
>                 Key: FLINK-37435
>                 URL: https://issues.apache.org/jira/browse/FLINK-37435
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 2.0.0
>            Reporter: Zakelly Lan
>            Priority: Major
>         Attachments: image-2025-03-07-12-29-54-443.png, 
> profile-results-after.zip, profile-results-before.zip
>
>
> Seems a obvious regression across all java version.
> http://flink-speed.xyz/timeline/?exe=6%2C12%2C13&base=&ben=readKryo&env=3&revs=200&equid=off&quarts=on&extr=on
> http://flink-speed.xyz/timeline/?exe=6%2C12%2C13&base=&ben=serializerKryo&env=3&revs=200&equid=off&quarts=on&extr=on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to