L. C. Hsieh created SPARK-36809:
-----------------------------------

             Summary: Remove broadcast for InSubqueryExec used in DPP
                 Key: SPARK-36809
                 URL: https://issues.apache.org/jira/browse/SPARK-36809
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.0
            Reporter: L. C. Hsieh


Currently we include a broadcast variable in InSubqueryExec. We use it to hold 
filtering side query result of DPP. It looks weird because we don't use the 
result in executors but only need the result in the driver during query 
planning. We already hold the original result, so basically we hold two copied 
of query result at this moment.

Another thing related is, in pruningHasBenefit we estimate if DPP pruning has 
benefit when the join type does not support broadcast. Due to the broadcast 
variable above, we also check the filtering side against the config 
autoBroadcastJoinThreshold. The config is not for the purpose and it is not a 
broadcast join. As the broadcast variable is unnecessary, we can remove this 
check and leave benefit estimation to overhead and pruning size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to