[
https://issues.apache.org/jira/browse/SPARK-53809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuchuan Huang updated SPARK-53809:
----------------------------------
Description:
Query optimization rules such as MergeScalarSubqueries check if two plans are
identical by [comparing their canonicalized form|#L219]. For DSv2, the
comparison goes down to DataSourceV2ScanRelation in the hierarchy, which
currently lacks of canonicalize function.
This ticket aims to add doCanonicalize function for DataSourceV2ScanRelation,
as well as the Scan interface. The reason is that two identical scan may have
predicates in different order during QO rewrite. As a reference, [FileScan
normalizes filters in def equal()|#L107]]
was:
Query optimization rules such as MergeScalarSubqueries check if two plans are
identical by [comparing their canonicalized form|#L219]. For DSv2, the
comparison goes down to DataSourceV2ScanRelation in the hierarchy, which
currently lacks canonicalize function.
This ticket aims to add doCanonicalize function for DataSourceV2ScanRelation,
as well as the Scan interface. The reason is that two identical scan may have
predicates in different order during QO rewrite. As a reference, [FileScan
normalizes filters in def
equal()|[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala#L107]]
> Add canonicalization for dsv2 scan
> ----------------------------------
>
> Key: SPARK-53809
> URL: https://issues.apache.org/jira/browse/SPARK-53809
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.1.0
> Reporter: Yuchuan Huang
> Priority: Major
>
> Query optimization rules such as MergeScalarSubqueries check if two plans are
> identical by [comparing their canonicalized form|#L219]. For DSv2, the
> comparison goes down to DataSourceV2ScanRelation in the hierarchy, which
> currently lacks of canonicalize function.
>
> This ticket aims to add doCanonicalize function for DataSourceV2ScanRelation,
> as well as the Scan interface. The reason is that two identical scan may have
> predicates in different order during QO rewrite. As a reference, [FileScan
> normalizes filters in def equal()|#L107]]
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]