[
https://issues.apache.org/jira/browse/SPARK-53809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dongjoon Hyun updated SPARK-53809:
----------------------------------
Parent: SPARK-51166
Issue Type: Sub-task (was: Improvement)
> Add canonicalization for dsv2 scan
> ----------------------------------
>
> Key: SPARK-53809
> URL: https://issues.apache.org/jira/browse/SPARK-53809
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.1.0
> Reporter: Yuchuan Huang
> Assignee: Yuchuan Huang
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.1.0
>
>
> Query optimization rules such as MergeScalarSubqueries check if two plans are
> identical by [comparing their canonicalized
> form|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala#L219].
> For DSv2, for physical plan, the canonicalization goes down in the child
> hierarchy to the BatchScanExec, which [has a doCanonicalize
> function|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala#L150];
> for logical plan, the canonicalization goes down to the
> DataSourceV2ScanRelation, which, however, does not have a doCanonicalize
> function. As a result, two logical plans who are semantically identical are
> not identified.
> This PR proposes to add doCanonicalize function for DataSourceV2ScanRelation.
> The implementation is similar to [the one implemented in
> BatchScanExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala#L150],
> because they are both the leafNodes of DSv2 logicalPlan and physicalPlan,
> respectively.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]