[
https://issues.apache.org/jira/browse/SPARK-51732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-51732:
-----------------------------------
Labels: pull-request-available (was: )
> Apply `rpad` on attributes with same `ExprId` if they need to be deduplicated
> ------------------------------------------------------------------------------
>
> Key: SPARK-51732
> URL: https://issues.apache.org/jira/browse/SPARK-51732
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.0.0, 4.1.0
> Reporter: Mihailo Timotic
> Priority: Major
> Labels: pull-request-available
>
> We need to apply `rpad` on attributes that have the same `ExprId` if those
> attributes should be deduplicated.
> For example:
> {code:java}
> CREATE OR REPLACE TABLE t(a CHAR(50)); {code}
> {code:java}
> SELECT t1.aFROM t t1
> WHERE (SELECT count(*) AS item_cnt FROM t t2 WHERE (t1.a = t2.a)) > 0
> {code}
> In the above case, `ApplyCharTypePadding` will run for subquery where `t1.a`
> and `t2.a` will reference the same `ExprId`, therefore we won't apply `rpad`.
> However, after `DeduplicateRelations` runs for outer query, `t1.a` and `t2.a`
> will get different `ExprIds` and would therefore need `rpad`. However, this
> doesn't happen because `ApplyCharTypePadding` for outer query does not
> recurse into the subquery.
> On the other hand, for a query:
> {code:java}
> SELECT t1.a
> FROM t t1, t t2
> WHERE t1.a = t2.a {code}
> `ApplyCharTypePadding` will correctly add `rpad` to both `t1.a` and `t2.a`
> because attributes will first be deduplicated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]