L. C. Hsieh created SPARK-36809:
-----------------------------------
Summary: Remove broadcast for InSubqueryExec used in DPP
Key: SPARK-36809
URL: https://issues.apache.org/jira/browse/SPARK-36809
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.3.0
Reporter: L. C. Hsieh
Currently we include a broadcast variable in InSubqueryExec. We use it to hold
filtering side query result of DPP. It looks weird because we don't use the
result in executors but only need the result in the driver during query
planning. We already hold the original result, so basically we hold two copied
of query result at this moment.
Another thing related is, in pruningHasBenefit we estimate if DPP pruning has
benefit when the join type does not support broadcast. Due to the broadcast
variable above, we also check the filtering side against the config
autoBroadcastJoinThreshold. The config is not for the purpose and it is not a
broadcast join. As the broadcast variable is unnecessary, we can remove this
check and leave benefit estimation to overhead and pruning size.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]