[ 
https://issues.apache.org/jira/browse/SPARK-57727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated SPARK-57727:
------------------------------
    Description: 
`QueryPlanConstraints.inferAdditionalConstraints` infers b's predicate from a = 
b and a's predicate by substituting attribute a with b. Under a 
non-binary-stable collation, a = b is a collation equality (e.g. 'hello' = 
'HELLO' under UTF8_LCASE), not byte equality, so the substitution is 
problematic and silently drops rows.

Repro:
{code:sql}
CREATE TABLE t (a STRING COLLATE UTF8_LCASE, b STRING COLLATE UTF8_LCASE);
INSERT INTO t VALUES ('hello', 'HELLO');
SELECT a, b FROM t WHERE a = b AND a = 'hello' COLLATE UTF8_BINARY;
{code}
Returns no rows with constraint propagation enabled (default); should return 
('hello','HELLO').

Same class as SPARK-55647 (fixed in ConstantPropagation); this sibling rule was 
left unguarded.

> Fix inferAdditionalConstraints incorrectly substituting attributes with 
> non-binary-stable collations
> ----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57727
>                 URL: https://issues.apache.org/jira/browse/SPARK-57727
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 5.0.0
>            Reporter: Eric Yang
>            Priority: Major
>
> `QueryPlanConstraints.inferAdditionalConstraints` infers b's predicate from a 
> = b and a's predicate by substituting attribute a with b. Under a 
> non-binary-stable collation, a = b is a collation equality (e.g. 'hello' = 
> 'HELLO' under UTF8_LCASE), not byte equality, so the substitution is 
> problematic and silently drops rows.
> Repro:
> {code:sql}
> CREATE TABLE t (a STRING COLLATE UTF8_LCASE, b STRING COLLATE UTF8_LCASE);
> INSERT INTO t VALUES ('hello', 'HELLO');
> SELECT a, b FROM t WHERE a = b AND a = 'hello' COLLATE UTF8_BINARY;
> {code}
> Returns no rows with constraint propagation enabled (default); should return 
> ('hello','HELLO').
> Same class as SPARK-55647 (fixed in ConstantPropagation); this sibling rule 
> was left unguarded.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to