[
https://issues.apache.org/jira/browse/SPARK-24395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16494181#comment-16494181
]
Juliusz Sompolski commented on SPARK-24395:
-------------------------------------------
The question is whether the literals should be treated as structs, or unpacked?
If like structs, then the current behavior is correct, I think.
But when a similar query is IN / NOT IN subquery, it is currently treated as if
the left hand side was unpacked into independent columns.
cc [~mgaido] [~hvanhovell]
> Fix Behavior of NOT IN with Literals Containing NULL
> ----------------------------------------------------
>
> Key: SPARK-24395
> URL: https://issues.apache.org/jira/browse/SPARK-24395
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.2
> Reporter: Miles Yucht
> Priority: Major
>
> Spark does not return the correct answer when evaluating NOT IN in some
> cases. For example:
> {code:java}
> CREATE TEMPORARY VIEW m AS SELECT * FROM VALUES
> (null, null)
> AS m(a, b);
> SELECT *
> FROM m
> WHERE a IS NULL AND b IS NULL
> AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2,
> 1))));{code}
> According to the semantics of null-aware anti-join, this should return no
> rows. However, it actually returns the row {{NULL NULL}}. This was found by
> inspecting the unit tests added for SPARK-24381
> ([https://github.com/apache/spark/pull/21425#pullrequestreview-123421822).]
> *Acceptance Criteria*:
> * We should be able to add the following test cases back to
> {{subquery/in-subquery/not-in-unit-test-multi-column-literal.sql}}:
> {code:java}
> -- Case 2
> -- (subquery contains a row with null in all columns -> row not returned)
> SELECT *
> FROM m
> WHERE (a, b) NOT IN ((CAST (null AS INT), CAST (null AS DECIMAL(2, 1))));
> -- Case 3
> -- (probe-side columns are all null -> row not returned)
> SELECT *
> FROM m
> WHERE a IS NULL AND b IS NULL -- Matches only (null, null)
> AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2,
> 1))));
> -- Case 4
> -- (one column null, other column matches a row in the subquery result ->
> row not returned)
> SELECT *
> FROM m
> WHERE b = 1.0 -- Matches (null, 1.0)
> AND (a, b) NOT IN ((0, 1.0), (2, 3.0), (4, CAST(null AS DECIMAL(2,
> 1))));
> {code}
>
> cc [~smilegator] [~juliuszsompolski]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]