[
https://issues.apache.org/jira/browse/SPARK-36809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17417735#comment-17417735
]
Apache Spark commented on SPARK-36809:
--------------------------------------
User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/34051
> Remove broadcast for InSubqueryExec used in DPP
> -----------------------------------------------
>
> Key: SPARK-36809
> URL: https://issues.apache.org/jira/browse/SPARK-36809
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: L. C. Hsieh
> Priority: Major
>
> Currently we include a broadcast variable in InSubqueryExec. We use it to
> hold filtering side query result of DPP. It looks weird because we don't use
> the result in executors but only need the result in the driver during query
> planning. We already hold the original result, so basically we hold two
> copied of query result at this moment.
> Another thing related is, in pruningHasBenefit we estimate if DPP pruning has
> benefit when the join type does not support broadcast. Due to the broadcast
> variable above, we also check the filtering side against the config
> autoBroadcastJoinThreshold. The config is not for the purpose and it is not a
> broadcast join. As the broadcast variable is unnecessary, we can remove this
> check and leave benefit estimation to overhead and pruning size.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]