mingmwang commented on issue #5808: URL: https://github.com/apache/arrow-datafusion/issues/5808#issuecomment-1491736320
Sure, I will take a look. It is tricky to support decorate the correlated In/Exist subqueries which contains `Limit`/`OrderBy `clauses. I remember SparkSQL will report error in the case, PostgreSQL will not report error but will not decorate and just keep the nested subquery. Hyper is more advanced to handle this. In DataFusion, if we want to support this, we need to think and test all the difference cases carefully: -- Expected behavior: can be de-correlated, limit must be removed explain SELECT t1.id, t1.name FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = t1.id limit 1); -- Expected behavior: can be de-correlated, should keep the inner limit and must remove the outer limit explain SELECT t1.id, t1.name FROM t1 WHERE EXISTS (SELECT * FROM (SELECT * FROM t2 limit 10) as t2 WHERE t2.id = t1.id limit 1); -- Expected behavior: can be de-correlated, must keep the limit explain SELECT t1.id, t1.name FROM t1 WHERE t1.id in (SELECT t2.id FROM t2 limit 10); -- Expected behavior: can not be de-correlated, must keep limit explain SELECT t1.id, t1.name FROM t1 WHERE t1.id in (SELECT t2.id FROM t2 where t1.name = t2.name limit 10) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
