[
https://issues.apache.org/jira/browse/SPARK-46446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867132#comment-17867132
]
Andy Lam commented on SPARK-46446:
----------------------------------
[~cloud_fan] Could we unresolve this ticket or create a new one? Decorrelate
subqueries with correlation under LIMIT with OFFSET hasn't been fixed, just
disabled.
> Correctness bug in correlated subquery with OFFSET
> --------------------------------------------------
>
> Key: SPARK-46446
> URL: https://issues.apache.org/jira/browse/SPARK-46446
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 4.0.0
> Reporter: Jack Chen
> Assignee: Jack Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Subqueries with correlation under LIMIT with OFFSET have a correctness bug,
> introduced recently when support for correlation under OFFSET was enabled but
> were not handled correctly. (So we went from unsupported, query throws error
> -> wrong results.)
> It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS
>
> It's easy to repro with a query like
> {code:java}
> create table x(x1 int, x2 int);
> insert into x values (1, 1), (2, 2);
> create table y(y1 int, y2 int);
> insert into y values (1, 1), (1, 2), (2, 4);
> select * from x where exists (select * from y where x1 = y1 limit 1 offset
> 2){code}
> Correct result: empty set, see postgres:
> [https://www.db-fiddle.com/f/dtXNn7hwDnemiCTUhvwgYM/0]
> Spark result: Array([2,2])
>
> The
> [PR|https://github.com/apache/spark/pull/43111/files/324a106611e6d62c31535cfc43863fdaa16e5dda#diff-583171e935b2dc349378063a5841c5b98b30a2d57ac3743a9eccfe7bffcb8f2aR1403]
> where it was introduced added a test for it, but the golden file results for
> the test actually were incorrect and we didn't notice. (The bug was initially
> found by https://github.com/apache/spark/pull/44084)
> I'll work on both:
> * Adding support for offset in DecorrelateInnerQuery (the transformation is
> into a filter on row_number window function, similar to limit).
> * Adding a feature flag to enable/disable offset in subquery support
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]