Jack Chen created SPARK-46446:
---------------------------------

             Summary: Correctness bug in correlated subquery with OFFSET
                 Key: SPARK-46446
                 URL: https://issues.apache.org/jira/browse/SPARK-46446
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Jack Chen


Subqueries with correlation under LIMIT with OFFSET have a correctness bug, 
introduced recently when support for correlation under OFFSET was enabled but 
were not handled correctly. (So we went from unsupported, query throws error -> 
wrong results.)

It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS

It's easy to repro with a query like
{code:java}
SELECT * 
FROM   emp 
join lateral   (SELECT dept.dept_name
               FROM   dept 
               WHERE  emp.dept_id = dept.dept_id
               LIMIT 5 OFFSET 3); {code}
The 
[PR|https://github.com/apache/spark/pull/43111/files/324a106611e6d62c31535cfc43863fdaa16e5dda#diff-583171e935b2dc349378063a5841c5b98b30a2d57ac3743a9eccfe7bffcb8f2aR1403]
 where it was introduced added a test for it, but the golden file results for 
the test actually were incorrect and we didn't notice.

I'll work on both:
 * Adding support for offset in DecorrelateInnerQuery (the transformation is 
into a filter on row_number window function, similar to limit).

 * Adding a feature flag to enable/disable offset in subquery support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to