[ 
https://issues.apache.org/jira/browse/FLINK-37435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935021#comment-17935021
 ] 

Kurt Ostfeld edited comment on FLINK-37435 at 3/13/25 1:03 AM:
---------------------------------------------------------------

I created a new benchmark in the flink-benchmarks project with two files:
[https://gist.github.com/kurtostfeld/1a6a6cf1a73d85f238fe0522be6f2d43]
[https://gist.github.com/kurtostfeld/a7e7bdc36a26bfb793c9d01b1a8520d4]

I'm not checking this in. You can copy these two source files into the source 
tree and run the benchmark via:

```
mvn package
java -jar target/benchmarks.jar -rf csv 
"org.apache.flink.benchmark.full.KryoBenchmark"
```

It results in (using my laptop with Temurin openjdk 17 distribution):

Benchmark Mode Cnt Score Error Units
KryoBenchmark.readKryoBaseline thrpt 25 534.628 ± 6.197 ops/ms
KryoBenchmark.readKryoVersionB thrpt 25 542.362 ± 7.574 ops/ms
KryoBenchmark.readKryoVersionC thrpt 25 537.827 ± 8.429 ops/ms
KryoBenchmark.readKryoVersionD thrpt 25 816.206 ± 11.167 ops/ms
KryoBenchmark.readKryoVersionE thrpt 25 1255.128 ± 49.761 ops/ms
KryoBenchmark.readKryoVersionF thrpt 25 2251.305 ± 99.973 ops/ms
KryoBenchmark.readKryoVersionG thrpt 25 4069.846 ± 820.285 ops/ms

To explain the results, starting from the slowest baseline benchmark that is 
mirroring PojoSerializationBenchmark.readKryo to the fastest benchmark:
 - KryoBenchmark.readKryoBaseline (534.628 ops/ms). This simply mirrors the 
official PojoSerializationBenchmark.readKryo benchmark.
 - KryoBenchmark.readKryoVersionB (542.362 ops/ms). This is an expanded for 
clarity version of the baseline benchmark with nearly identical benchmark 
results.
 - KryoBenchmark.readKryoVersionC (537.827 ops/ms). This version removes 
unnecessary layers of InputStream wrappers. This provides no performance 
improvement.
 - KryoBenchmark.readKryoVersionD (816.206 ops/ms). This version switches from 
NoFetchingInput to OldNoFetchInput which is a near copy/paste of 
NoFetchingInput from before the Kryo upgrade.
 - KryoBenchmark.readKryoVersionE (1255.128 ops/ms). This version switches from 
OldNoFetchInput to Input.
 - KryoBenchmark.readKryoVersionF (2251.305 ops/ms). This switches from the 
heavily customized Kryo created by Flink KryoSerializer to a much simpler Kryo 
configuration.
 - KryoBenchmark.readKryoVersionG (4069.846 ops/ms). This does Input -> byte[] 
where the previous benchmarks do Input -> ByteArrayInputStream -> byte[].

To summarize, that's a ~8x performance difference from the way 
PojoSerializationBenchmark.readKryo works to a more optimized version caused by 
three changes:

1. NoFetchingInput -> OldNoFetchingInput -> Input.
2. Simple Kryo config vs complex Kryo config done by Flink KryoSerializer
3. Input -> byte[] instead of Input -> ByteArrayInputStream -> byte[]
 * It looks like the OldNoFetchingInput -> NoFetchingInput changes made during 
the Kryo upgrade may have caused the performance drop. It's not as simple as 
rolling back those changes. The old NoFetchingInput was causing errors with 
Kryo 5.
 * The only significant changes to the NoFetchingInput class is in the require 
method. The new require method is mostly a copy/paste from the Kryo 5 Input 
class with changes so that it will never read ahead more than required, which 
is the point of the NoFetching variation.
 * Kryo 5 Input runs faster than either the old or new version of 
NoFetchingInput because it will cache or read ahead more than needed. The Flink 
framework doesn't like that, hence the NoFetching variations.

One option to consider for performance would be to add a TypeSerializer 
deserialize option  to deserialize an object straight from a byte[].

The other changes can make this benchmark much faster, but can't be easily 
dropped-in without bigger architectural changes.


was (Author: JIRAUSER300008):
I created a new benchmark in the flink-benchmarks project with two files:
[https://gist.github.com/kurtostfeld/1a6a6cf1a73d85f238fe0522be6f2d43]
[https://gist.github.com/kurtostfeld/a7e7bdc36a26bfb793c9d01b1a8520d4]

I'm not checking this in. You can copy these two source files into the source 
tree and run the benchmark via:

```
mvn package
java -jar target/benchmarks.jar -rf csv 
"org.apache.flink.benchmark.full.KryoBenchmark"
```

It results in (using my laptop with Temurin openjdk 17 distribution):

Benchmark Mode Cnt Score Error Units
KryoBenchmark.readKryoBaseline thrpt 25 534.628 ± 6.197 ops/ms
KryoBenchmark.readKryoVersionB thrpt 25 542.362 ± 7.574 ops/ms
KryoBenchmark.readKryoVersionC thrpt 25 537.827 ± 8.429 ops/ms
KryoBenchmark.readKryoVersionD thrpt 25 816.206 ± 11.167 ops/ms
KryoBenchmark.readKryoVersionE thrpt 25 1255.128 ± 49.761 ops/ms
KryoBenchmark.readKryoVersionF thrpt 25 2251.305 ± 99.973 ops/ms
KryoBenchmark.readKryoVersionG thrpt 25 4069.846 ± 820.285 ops/ms

To explain the results, starting from the slowest baseline benchmark that is 
mirroring PojoSerializationBenchmark.readKryo to the fastest benchmark:
 - KryoBenchmark.readKryoBaseline (534.628 ops/ms). This simply mirrors the 
official PojoSerializationBenchmark.readKryo benchmark.
 - KryoBenchmark.readKryoVersionB (542.362 ops/ms). This is an expanded for 
clarity version of the baseline benchmark with nearly identical benchmark 
results.
 - KryoBenchmark.readKryoVersionC (537.827 ops/ms). This version removes 
unnecessary layers of InputStream wrappers. This provides no performance 
improvement.
 - KryoBenchmark.readKryoVersionD (816.206 ops/ms). This version switches from 
NoFetchingInput to OldNoFetchInput which is a near copy/paste of 
NoFetchingInput from before the Kryo upgrade.
 - KryoBenchmark.readKryoVersionE (1255.128 ops/ms). This version switches from 
OldNoFetchInput to Input.
 - KryoBenchmark.readKryoVersionF (2251.305 ops/ms). This switches from the 
heavily customized Kryo created by Flink KryoSerializer to a much simpler Kryo 
configuration.
 - KryoBenchmark.readKryoVersionG (4069.846 ops/ms). This does Input -> byte[] 
where the previous benchmarks do Input -> ByteArrayInputStream -> byte[].

To summarize, that's a ~8x performance difference from the way 
PojoSerializationBenchmark.readKryo works to a more optimized version caused by 
three changes:

1. NoFetchingInput -> OldNoFetchingInput -> Input.
2. Simple Kryo config vs complex Kryo config done by Flink KryoSerializer
3. Input -> byte[] instead of Input -> ByteArrayInputStream -> byte[]
 * It looks like the OldNoFetchingInput -> NoFetchingInput changes made during 
the Kryo upgrade may have caused the performance drop. It's not as simple as 
rolling back those changes. The old NoFetchingInput was causing errors with 
Kryo 5.
 * The only significant changes to the NoFetchingInput class is in the require 
method. The new require method is mostly a copy/paste from the Kryo 5 Input 
class with changes so that it will never read ahead more than required, which 
is the point of the NoFetching variation.

The other changes can make this benchmark much faster, but can't be easily 
dropped-in without bigger architectural changes.

> Kryo related perf regression since March 5th
> --------------------------------------------
>
>                 Key: FLINK-37435
>                 URL: https://issues.apache.org/jira/browse/FLINK-37435
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 2.0.0
>            Reporter: Zakelly Lan
>            Priority: Major
>         Attachments: image-2025-03-07-12-29-54-443.png, 
> profile-results-after.zip, profile-results-before.zip
>
>
> Seems a obvious regression across all java version.
> http://flink-speed.xyz/timeline/?exe=6%2C12%2C13&base=&ben=readKryo&env=3&revs=200&equid=off&quarts=on&extr=on
> http://flink-speed.xyz/timeline/?exe=6%2C12%2C13&base=&ben=serializerKryo&env=3&revs=200&equid=off&quarts=on&extr=on



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to