kbendick commented on issue #5487: URL: https://github.com/apache/iceberg/issues/5487#issuecomment-1212476159
Can you please provide a sample `MERGE INTO` query @SusurHe? When I think of dynamic partition pruning, I think of the ability to filter out specific partitions from the input table during a query based on some kind of `WHERE` clause, not necessarily the results of adaptive query execution. The dynamic partition pruning I'm thinking of is controlled by the `SupportsRuntimeFiltering` interface, which was added in Iceberg 0.13.0. If you could provide the `MERGE INTO` query, the results of `EXPLAIN` or the full explain output from the SQL UI, as well as generally the number of partition values that were touched (or at least the create table DDL of the table being merged into), that would help a lot to debug why you're not getting fewer partition files. However, from what's being provided, it's hard to tell if dynamic partition pruning, i.e. if partitions are pruned from the input before the join / processing, is taking place. But I get the impression that's not what you mean and you're more interested in adaptive query execution. In either case, you might need to set `spark.sql.adaptive.coalescePartitions.parallelismFirst` to `false`. From the docs for `spark.sql.adaptive.coalescePartitions.parallelismFirst` found here: https://spark.apache.org/docs/latest/sql-performance-tuning.html `When true, Spark ignores the target size specified by spark.sql.adaptive.advisoryPartitionSizeInBytes (default 64MB) when coalescing contiguous shuffle partitions, and only respect the minimum partition size specified by spark.sql.adaptive.coalescePartitions.minPartitionSize (default 1MB), to maximize the parallelism. This is to avoid performance regression when enabling adaptive query execution. It's recommended to set this config to false and respect the target size specified by spark.sql.adaptive.advisoryPartitionSizeInBytes.` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
