Dustin Smith created SPARK-54593:
------------------------------------

             Summary: Expand DPP for LogicalRelation and LogicalRDD
                 Key: SPARK-54593
                 URL: https://issues.apache.org/jira/browse/SPARK-54593
             Project: Spark
          Issue Type: Improvement
          Components: Optimizer, SQL
    Affects Versions: 4.0.0, 3.0.0
            Reporter: Dustin Smith


Enable Dynamic Partition Pruning (DPP) for queries involving LocalRelation and 
LogicalRDD as the filtering side of broadcast joins. These logical plan nodes 
represent small, materialized datasets that are ideal candidates for DPP 
optimization.
 #  PartitionPruning.scala
 ** Modified hasSelectivePredicate() to treat LocalRelation and LogicalRDD as 
selective predicates
 ** Added overhead calculation for LocalRelation and LogicalRDD in 
calculatePlanOverhead()
 ** LogicalRDD is only considered for DPP when it has originStats with 
rowCount, indicating materialized data
 # Test Coverage
 ** Added 5 comprehensive tests in DynamicPartitionPruningSuite.scala:
 *** DPP with LocalRelation in broadcast join
 *** DPP with LogicalRDD from cached DataFrame in broadcast join
 *** DPP with empty LocalRelation 
 *** DPP should not trigger for LogicalRDD without originStats 
 *** DPP with large LocalRelation 

Will supply a Github PR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to