[GitHub] [arrow-datafusion] mingmwang commented on issue #5808: `decorrelate_where_in` reports error when optimizing `limit subquery`

via GitHub Fri, 31 Mar 2023 03:54:23 -0700


mingmwang commented on issue #5808:
URL: 
https://github.com/apache/arrow-datafusion/issues/5808#issuecomment-1491736320


   Sure, I will take a look. It is tricky to support decorate the correlated 
In/Exist subqueries which contains `Limit`/`OrderBy `clauses. I remember 
SparkSQL will report error in the case, PostgreSQL will not report error but 
will not decorate and just keep the nested subquery. Hyper is more advanced to 
handle this.
   
   In DataFusion, if we want to support this, we need to think and test all the 
difference cases carefully:
   
   -- Expected behavior: can be de-correlated, limit must be removed
   explain  
   SELECT t1.id, t1.name FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t2.id = 
t1.id limit 1);
   
   -- Expected behavior: can be de-correlated, should keep the inner limit and 
must remove the outer limit
   explain  
   SELECT t1.id, t1.name FROM t1 WHERE EXISTS (SELECT * FROM (SELECT * FROM t2 
limit 10) as t2 WHERE t2.id = t1.id limit 1);
   
   -- Expected behavior: can be de-correlated, must keep the limit
   explain 
   SELECT t1.id, t1.name FROM t1 WHERE t1.id in (SELECT t2.id FROM t2 limit 10);
   
   -- Expected behavior: can not be de-correlated, must keep limit
   explain  
   SELECT t1.id, t1.name FROM t1 WHERE t1.id in (SELECT t2.id FROM t2 where 
t1.name = t2.name limit 10)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mingmwang commented on issue #5808: `decorrelate_where_in` reports error when optimizing `limit subquery`

Reply via email to