Jack Chen created SPARK-46446:
---------------------------------
Summary: Correctness bug in correlated subquery with OFFSET
Key: SPARK-46446
URL: https://issues.apache.org/jira/browse/SPARK-46446
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 4.0.0
Reporter: Jack Chen
Subqueries with correlation under LIMIT with OFFSET have a correctness bug,
introduced recently when support for correlation under OFFSET was enabled but
were not handled correctly. (So we went from unsupported, query throws error ->
wrong results.)
It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS
It's easy to repro with a query like
{code:java}
SELECT *
FROM emp
join lateral (SELECT dept.dept_name
FROM dept
WHERE emp.dept_id = dept.dept_id
LIMIT 5 OFFSET 3); {code}
The
[PR|https://github.com/apache/spark/pull/43111/files/324a106611e6d62c31535cfc43863fdaa16e5dda#diff-583171e935b2dc349378063a5841c5b98b30a2d57ac3743a9eccfe7bffcb8f2aR1403]
where it was introduced added a test for it, but the golden file results for
the test actually were incorrect and we didn't notice.
I'll work on both:
* Adding support for offset in DecorrelateInnerQuery (the transformation is
into a filter on row_number window function, similar to limit).
* Adding a feature flag to enable/disable offset in subquery support
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]