[
https://issues.apache.org/jira/browse/SPARK-57727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Yang updated SPARK-57727:
------------------------------
Description:
`QueryPlanConstraints.inferAdditionalConstraints` infers b's predicate from a =
b and a's predicate by substituting attribute a with b. Under a
non-binary-stable collation, a = b is a collation equality (e.g. 'hello' =
'HELLO' under UTF8_LCASE), not byte equality, so the substitution is
problematic and silently drops rows.
Repro:
{code:sql}
CREATE TABLE t (a STRING COLLATE UTF8_LCASE, b STRING COLLATE UTF8_LCASE);
INSERT INTO t VALUES ('hello', 'HELLO');
SELECT a, b FROM t WHERE a = b AND a = 'hello' COLLATE UTF8_BINARY;
{code}
Returns no rows with constraint propagation enabled (default); should return
('hello','HELLO').
Same class as SPARK-55647 (fixed in ConstantPropagation); this sibling rule was
left unguarded.
> Fix inferAdditionalConstraints incorrectly substituting attributes with
> non-binary-stable collations
> ----------------------------------------------------------------------------------------------------
>
> Key: SPARK-57727
> URL: https://issues.apache.org/jira/browse/SPARK-57727
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Eric Yang
> Priority: Major
>
> `QueryPlanConstraints.inferAdditionalConstraints` infers b's predicate from a
> = b and a's predicate by substituting attribute a with b. Under a
> non-binary-stable collation, a = b is a collation equality (e.g. 'hello' =
> 'HELLO' under UTF8_LCASE), not byte equality, so the substitution is
> problematic and silently drops rows.
> Repro:
> {code:sql}
> CREATE TABLE t (a STRING COLLATE UTF8_LCASE, b STRING COLLATE UTF8_LCASE);
> INSERT INTO t VALUES ('hello', 'HELLO');
> SELECT a, b FROM t WHERE a = b AND a = 'hello' COLLATE UTF8_BINARY;
> {code}
> Returns no rows with constraint propagation enabled (default); should return
> ('hello','HELLO').
> Same class as SPARK-55647 (fixed in ConstantPropagation); this sibling rule
> was left unguarded.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]