[
https://issues.apache.org/jira/browse/FLINK-37435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935021#comment-17935021
]
Kurt Ostfeld edited comment on FLINK-37435 at 3/13/25 12:34 AM:
----------------------------------------------------------------
I created a new benchmark in the flink-benchmarks project with two files:
https://gist.github.com/kurtostfeld/1a6a6cf1a73d85f238fe0522be6f2d43
https://gist.github.com/kurtostfeld/a7e7bdc36a26bfb793c9d01b1a8520d4
I'm not checking this in. You can copy these two source files into the source
tree and run the benchmark via:
```
mvn package
java -jar target/benchmarks.jar -rf csv
"org.apache.flink.benchmark.full.KryoBenchmark"
```
It results in (using my laptop with Temurin openjdk 17 distribution):
Benchmark Mode Cnt Score Error Units
KryoBenchmark.readKryoBaseline thrpt 25 534.628 ± 6.197 ops/ms
KryoBenchmark.readKryoVersionB thrpt 25 542.362 ± 7.574 ops/ms
KryoBenchmark.readKryoVersionC thrpt 25 537.827 ± 8.429 ops/ms
KryoBenchmark.readKryoVersionD thrpt 25 816.206 ± 11.167 ops/ms
KryoBenchmark.readKryoVersionE thrpt 25 1255.128 ± 49.761 ops/ms
KryoBenchmark.readKryoVersionF thrpt 25 2251.305 ± 99.973 ops/ms
KryoBenchmark.readKryoVersionG thrpt 25 4069.846 ± 820.285 ops/ms
To explain the results, starting from the slowest baseline benchmark that is
mirroring PojoSerializationBenchmark.readKryo to the fastest benchmark:
- KryoBenchmark.readKryoBaseline (534.628 ops/ms). This simply mirrors the
official PojoSerializationBenchmark.readKryo benchmark.
- KryoBenchmark.readKryoVersionB (542.362 ops/ms). This is an expanded for
clarity version of the baseline benchmark with nearly identical benchmark
results.
- KryoBenchmark.readKryoVersionC (537.827 ops/ms). This version removes
unnecessary layers of InputStream wrappers. This provides no performance
improvement.
- KryoBenchmark.readKryoVersionD (816.206 ops/ms). This version switches from
NoFetchingInput to OldNoFetchInput which is a near copy/paste of
NoFetchingInput from before the Kryo upgrade.
- KryoBenchmark.readKryoVersionE (1255.128 ops/ms). This version switches from
OldNoFetchInput to Input.
- KryoBenchmark.readKryoVersionF (2251.305 ops/ms). This switches from the
heavily customized Kryo created by Flink KryoSerializer to a much simpler Kryo
configuration.
- KryoBenchmark.readKryoVersionG (4069.846 ops/ms). This does Input -> byte[]
where the previous benchmarks do Input -> ByteArrayInputStream -> byte[].
To summarize, that's a ~8x performance difference from the way
PojoSerializationBenchmark.readKryo works to a more optimized version caused by
three changes:
1. NoFetchingInput -> OldNoFetchingInput -> Input.
2. Simple Kryo config vs complex Kryo config done by Flink KryoSerializer
3. Input -> byte[] instead of Input -> ByteArrayInputStream -> byte[]
It looks like the OldNoFetchingInput -> NoFetchingInput changes made during the
Kryo upgrade may have caused the performance drop.
The other changes can make this benchmark much faster, but can't be easily
dropped-in without bigger architectural changes.
was (Author: JIRAUSER300008):
I created a new benchmark in the flink-benchmarks project with two files:
https://gist.github.com/kurtostfeld/1a6a6cf1a73d85f238fe0522be6f2d43
https://gist.github.com/kurtostfeld/a7e7bdc36a26bfb793c9d01b1a8520d4
I'm not checking this in. You can copy these two source files into the source
tree and run the benchmark via:
```
mvn package
java -jar target/benchmarks.jar -rf csv
"org.apache.flink.benchmark.full.KryoBenchmark"
```
It results in (using my laptop with Temurin openjdk 17 distribution):
Benchmark Mode Cnt Score Error Units
KryoBenchmark.readKryoBaseline thrpt 25 534.628 ± 6.197 ops/ms
KryoBenchmark.readKryoVersionB thrpt 25 542.362 ± 7.574 ops/ms
KryoBenchmark.readKryoVersionC thrpt 25 537.827 ± 8.429 ops/ms
KryoBenchmark.readKryoVersionD thrpt 25 816.206 ± 11.167 ops/ms
KryoBenchmark.readKryoVersionE thrpt 25 1255.128 ± 49.761 ops/ms
KryoBenchmark.readKryoVersionF thrpt 25 2251.305 ± 99.973 ops/ms
KryoBenchmark.readKryoVersionG thrpt 25 4069.846 ± 820.285 ops/ms
To explain the results, starting from the slowest baseline benchmark that is
mirroring PojoSerializationBenchmark.readKryo to the fastest benchmark:
- KryoBenchmark.readKryoBaseline (534.628 ops/ms). This simply mirrors the
official PojoSerializationBenchmark.readKryo benchmark.
- KryoBenchmark.readKryoVersionB (542.362 ops/ms). This is an expanded for
clarity version of the baseline benchmark with nearly identical benchmark
results.
- KryoBenchmark.readKryoVersionC (537.827 ops/ms). This version removes
unnecessary layers of InputStream wrappers. This provides no performance
improvement.
- KryoBenchmark.readKryoVersionD (816.206 ops/ms). This version switches from
NoFetchingInput to OldNoFetchInput which is a near copy/paste of
NoFetchingInput from before the Kryo upgrade.
- KryoBenchmark.readKryoVersionE (1255.128 ops/ms). This version switches from
OldNoFetchInput to Input. - - KryoBenchmark.readKryoVersionF (2251.305 ops/ms).
This switches from the heavily customized Kryo created by Flink KryoSerializer
to a much simpler Kryo configuration.
- KryoBenchmark.readKryoVersionG (4069.846 ops/ms). This does Input -> byte[]
where the previous benchmarks do Input -> ByteArrayInputStream -> byte[].
To summarize, that's a ~8x performance difference from the way
PojoSerializationBenchmark.readKryo works to a more optimized version caused by
three changes:
1. NoFetchingInput -> OldNoFetchingInput -> Input.
2. Simple Kryo config vs complex Kryo config done by Flink KryoSerializer
3. Input -> byte[] instead of Input -> ByteArrayInputStream -> byte[]
It looks like the OldNoFetchingInput -> NoFetchingInput changes made during the
Kryo upgrade may have caused the performance drop.
The other changes can make this benchmark much faster, but can't be easily
dropped-in without bigger architectural changes.
> Kryo related perf regression since March 5th
> --------------------------------------------
>
> Key: FLINK-37435
> URL: https://issues.apache.org/jira/browse/FLINK-37435
> Project: Flink
> Issue Type: Bug
> Components: API / Type Serialization System, Benchmarks
> Affects Versions: 2.0.0
> Reporter: Zakelly Lan
> Priority: Major
> Attachments: image-2025-03-07-12-29-54-443.png,
> profile-results-after.zip, profile-results-before.zip
>
>
> Seems a obvious regression across all java version.
> http://flink-speed.xyz/timeline/?exe=6%2C12%2C13&base=&ben=readKryo&env=3&revs=200&equid=off&quarts=on&extr=on
> http://flink-speed.xyz/timeline/?exe=6%2C12%2C13&base=&ben=serializerKryo&env=3&revs=200&equid=off&quarts=on&extr=on
--
This message was sent by Atlassian Jira
(v8.20.10#820010)