[GitHub] [iceberg] kbendick commented on issue #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled

GitBox Thu, 11 Aug 2022 13:46:00 -0700


kbendick commented on issue #5487:
URL: https://github.com/apache/iceberg/issues/5487#issuecomment-1212476159


   Can you please provide a sample `MERGE INTO` query @SusurHe?
   
   When I think of dynamic partition pruning, I think of the ability to filter 
out specific partitions from the input table during a query based on some kind 
of `WHERE` clause, not necessarily the results of adaptive query execution.
   
   The dynamic partition pruning I'm thinking of is controlled by the 
`SupportsRuntimeFiltering` interface, which was added in Iceberg 0.13.0.
   
   If you could provide the `MERGE INTO` query, the results of `EXPLAIN` or the 
full explain output from the SQL UI, as well as generally the number of 
partition values that were touched (or at least the create table DDL of the 
table being merged into), that would help a lot to debug why you're not getting 
fewer partition files.
   
   However, from what's being provided, it's hard to tell if dynamic partition 
pruning, i.e. if partitions are pruned from the input before the join / 
processing, is taking place. But I get the impression that's not what you mean 
and you're more interested in adaptive query execution.
   
   In either case, you might need to set 
`spark.sql.adaptive.coalescePartitions.parallelismFirst` to `false`.
   
   From the docs for `spark.sql.adaptive.coalescePartitions.parallelismFirst` 
found here: https://spark.apache.org/docs/latest/sql-performance-tuning.html
   
   `When true, Spark ignores the target size specified by 
spark.sql.adaptive.advisoryPartitionSizeInBytes (default 64MB) when coalescing 
contiguous shuffle partitions, and only respect the minimum partition size 
specified by spark.sql.adaptive.coalescePartitions.minPartitionSize (default 
1MB), to maximize the parallelism. This is to avoid performance regression when 
enabling adaptive query execution. It's recommended to set this config to false 
and respect the target size specified by 
spark.sql.adaptive.advisoryPartitionSizeInBytes.`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on issue #5487: Spark3.2 and spark3.3 Dynamic partition pruning is not enabled

Reply via email to