[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7691: Spark 3.4: Codegen support for UpdateRowsExec

via GitHub Tue, 23 May 2023 10:42:33 -0700


aokolnychyi commented on code in PR #7691:
URL: https://github.com/apache/iceberg/pull/7691#discussion_r1202770697



##########
spark/v3.4/spark-extensions/src/jmh/java/org/apache/iceberg/spark/UpdateProjectionBenchmark.java:
##########
@@ -146,6 +146,7 @@ private void setupSpark() {
             .config(SQLConf.DYNAMIC_PARTITION_PRUNING_ENABLED().key(), "false")
             .config(SQLConf.ADAPTIVE_EXECUTION_ENABLED().key(), "false")
             .config(SQLConf.SHUFFLE_PARTITIONS().key(), "2")
+            .config(SQLConf.CODEGEN_FACTORY_MODE().key(), "CODEGEN_ONLY")

Review Comment:
   I did not see a big performance improvement in the existing benchmark as 
read and write dominate, except the case with lots of updates. However, I've 
seen reduced memory pressure. Remember that the new approach without codegen 
was just a bit slower than the original projection. That said, codegen provides 
other benefits like sub expression elimination. It is important to be on par 
with the projection in terms of features.
   
   ```
         Benchmark                                                              
                Mode  Cnt            Score             Error   Units
   [OLD] UpdateProjectionBenchmark.mergeOnRead10Percent                         
                  ss    5            4.915 ±           0.058    s/op
   [OLD] UpdateProjectionBenchmark.mergeOnRead10Percent:·gc.count               
                  ss    5           12.000                    counts
   [NEW] UpdateProjectionBenchmark.mergeOnRead10Percent                         
                  ss    5            4.920 ±           0.080    s/op
   [NEW] UpdateProjectionBenchmark.mergeOnRead10Percent:·gc.count               
                  ss    5           11.000                    counts
   
   [OLD] UpdateProjectionBenchmark.mergeOnReadUpdate30Percent                   
                  ss    5           10.146 ±           0.347    s/op
   [OLD] UpdateProjectionBenchmark.mergeOnReadUpdate30Percent:·gc.count         
                  ss    5           25.000                    counts
   [NEW] UpdateProjectionBenchmark.mergeOnReadUpdate30Percent                   
                  ss    5           10.104 ±           0.122    s/op
   [NEW] UpdateProjectionBenchmark.mergeOnReadUpdate30Percent:·gc.count         
                  ss    5           20.000 
   
   [OLD] UpdateProjectionBenchmark.mergeOnReadUpdate75Percent                   
                  ss    5           26.108 ±           0.343    s/op
   [OLD] UpdateProjectionBenchmark.mergeOnReadUpdate75Percent:·gc.count         
                  ss    5          102.000                    counts
   [NEW] UpdateProjectionBenchmark.mergeOnReadUpdate75Percent                   
                  ss    5           24.331 ±           0.392    s/op
   [NEW] UpdateProjectionBenchmark.mergeOnReadUpdate75Percent:·gc.count         
                  ss    5           32.000                    counts
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7691: Spark 3.4: Codegen support for UpdateRowsExec

Reply via email to