jiangzhx commented on issue #5789:
URL: 
https://github.com/apache/arrow-datafusion/issues/5789#issuecomment-1498753570

   @crepererum I think you're right, just your PR  let this issue expose.
   
   these Exists, InSubquery, ScalarSubquery Expr did not complete the 
corresponding processing 
   in the create_physical_expr method of planner.rs.
   
https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/physical-expr/src/planner.rs#L501-L503
   
   Before your PR takes effect, it also does not actually handle the subquery 
scene at datasource 
   
https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/core/src/datasource/file_format/mod.rs#L83-L89
   
   so, i think the way is to optimize non-correlated subquery at 
decorrelate_where_exists optimizer
   
https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/optimizer/src/decorrelate_where_exists.rs#L185-L191
   
   the current decorrelate_where_exists only optimize
   ```
   SELECT t1.id FRO
   WHERE exists
   (
      SELECT t2.id 
   )
   /// and optimizes it into:
   SELECT t1.id
   FROM t1 LEFT SEM
   JOIN t2
   ON t1.id = t2.id
   ```
   may be we need add more rules to decorrelate_where_exists
   ```
   ///Rewrite non correlated exists subquery to use ScalarSubquery
   WHERE EXISTS (SELECT A FROM TABLE B WHERE COL1 > 10)
   ///will be rewritten to
   WHERE (SELECT 1 FROM (SELECT A FROM TABLE B WHERE COL1 > 10) LIMIT 1) IS NOT 
NULL
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to