[GitHub] [arrow-datafusion] jiangzhx commented on pull request #5907: optimize non-correlated where exists subquery rewrite to scalar subquery

via GitHub Tue, 18 Apr 2023 09:08:38 -0700


jiangzhx commented on PR #5907:
URL: 
https://github.com/apache/arrow-datafusion/pull/5907#issuecomment-1513437056


   > I will take a look. Usually, for un-correlated scalar subqueries, it will 
be kept as the subquery but need to do count check.
   > 
   > WHERE EXISTS (SELECT b FROM t2 where a>1 )
   > 
   > rewrite to
   > 
   > WHERE ScalarSubQuery(SELECT b FROM t2 where a>1 limit 1)
   > 
   > We need to add a physical SubQueryExec.
   
   yes，you are right,as you expressed, in Spark, rewriting `exists` to `scalar 
subquery` can solve the query requirements for this scenario.
   ```
   
   /**
    * Rewrite non correlated exists subquery to use ScalarSubquery
    *   WHERE EXISTS (SELECT A FROM TABLE B WHERE COL1 > 10)
    * will be rewritten to
    *   WHERE (SELECT 1 FROM (SELECT A FROM TABLE B WHERE COL1 > 10) LIMIT 1) 
IS NOT NULL
    */
   object RewriteNonCorrelatedExists extends Rule[LogicalPlan] {
     override def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformAllExpressionsWithPruning(
       _.containsPattern(EXISTS_SUBQUERY)) {
       case exists: Exists if exists.children.isEmpty =>
         IsNotNull(
           ScalarSubquery(
             plan = Limit(Literal(1), Project(Seq(Alias(Literal(1), "col")()), 
exists.plan)),
             exprId = exists.exprId,
             hint = exists.hint))
     }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] jiangzhx commented on pull request #5907: optimize non-correlated where exists subquery rewrite to scalar subquery

Reply via email to