aokolnychyi commented on code in PR #7691:
URL: https://github.com/apache/iceberg/pull/7691#discussion_r1202770697
##########
spark/v3.4/spark-extensions/src/jmh/java/org/apache/iceberg/spark/UpdateProjectionBenchmark.java:
##########
@@ -146,6 +146,7 @@ private void setupSpark() {
.config(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED().key(), "false")
.config(SQLConf.ADAPTIVE_EXECUTION_ENABLED().key(), "false")
.config(SQLConf.SHUFFLE_PARTITIONS().key(), "2")
+ .config(SQLConf.CODEGEN_FACTORY_MODE().key(), "CODEGEN_ONLY")
Review Comment:
I did not see a big performance improvement in the existing benchmark as
read and write dominate, except the case with lots of updates. However, I've
seen reduced memory pressure. Remember that the new approach without codegen
was just a bit slower than the original projection. That said, codegen provides
other benefits like sub expression elimination. It is important to be on par
with the projection in terms of features.
```
Benchmark
Mode Cnt Score Error Units
[OLD] UpdateProjectionBenchmark.mergeOnRead10Percent
ss 5 4.915 ± 0.058 s/op
[OLD] UpdateProjectionBenchmark.mergeOnRead10Percent:·gc.count
ss 5 12.000 counts
[NEW] UpdateProjectionBenchmark.mergeOnRead10Percent
ss 5 4.920 ± 0.080 s/op
[NEW] UpdateProjectionBenchmark.mergeOnRead10Percent:·gc.count
ss 5 11.000 counts
[OLD] UpdateProjectionBenchmark.mergeOnReadUpdate30Percent
ss 5 10.146 ± 0.347 s/op
[OLD] UpdateProjectionBenchmark.mergeOnReadUpdate30Percent:·gc.count
ss 5 25.000 counts
[NEW] UpdateProjectionBenchmark.mergeOnReadUpdate30Percent
ss 5 10.104 ± 0.122 s/op
[NEW] UpdateProjectionBenchmark.mergeOnReadUpdate30Percent:·gc.count
ss 5 20.000
[OLD] UpdateProjectionBenchmark.mergeOnReadUpdate75Percent
ss 5 26.108 ± 0.343 s/op
[OLD] UpdateProjectionBenchmark.mergeOnReadUpdate75Percent:·gc.count
ss 5 102.000 counts
[NEW] UpdateProjectionBenchmark.mergeOnReadUpdate75Percent
ss 5 24.331 ± 0.392 s/op
[NEW] UpdateProjectionBenchmark.mergeOnReadUpdate75Percent:·gc.count
ss 5 32.000 counts
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]