askoa opened a new issue, #4887:
URL: https://github.com/apache/arrow-datafusion/issues/4887

   **Describe the bug**
   While analyzing  the Logical Plan for TPCH-DS query 10 to find the cause of 
the issue  #4795, I found another issue with the Logical Plan generated. I 
noticed that the below subquery disappeared from Logical Plan.
   
   ```
       exists (select * 
               from catalog_sales,date_dim
               where c.c_customer_sk = cs_ship_customer_sk and
                     cs_sold_date_sk = d_date_sk and
                     d_year = 2002 and
                     d_moy between 4 and 4+3)
   ```
   
   **To Reproduce**
   Enable log tracing and run the test `tpcds_logical_q10` or 
`tpcds_physical_q10`. Check the logs for Logical Plan generated.
   
   **Expected behavior**
   The subquery should not disappear from Logical Plan.
   
   **Additional context**
   I did some additional analysis and found that the rule 
`common_sub_expression_eliminate` removes one of the `EXIST (<subquery>)`. 
Below are detailed steps
   
   logical plan before `decorrelate_where_exists` -> 
   ```
   ...
   Filter: ... EXISTS(<subquery>) AND (EXISTS(<subquery>) OR EXISTS(<subquery>))
   ...
   ```
   
   the rule `decorrelate_where_exists` mutates the first `EXIST` into a 
`LeftSemi Join`
   
   logical plan after `decorrelate_where_exists ` and before 
`common_sub_expression_eliminate` -> 
   ```
   ...
   Filter: ... (EXISTS(<subquery>) OR EXISTS(<subquery>))
       LeftSemi Join: ...
   ...
   ```
   
   the rule `common_sub_expression_eliminate` runs:
     - The rule maintains a HashMap of expressions to no. of occurrences. The 
expression `EXISTS(<subquery>)` is recognized as a single expression occurring 
twice.
   
   
https://github.com/apache/arrow-datafusion/blob/1844d39eb92f04e483095f491ff07da3a2f67f25/datafusion/optimizer/src/common_subexpr_eliminate.rs#L451-L454
   
     - The rule creates a projection that merges the two similar expressions 
and thus making the second subquery irrelevant.
   
   logical plan after `common_sub_expression_eliminate`:  
   ```
   ...
   Filter ...(EXISTS(<subquery>) OR EXISTS(<subquery>))...
       Projection: EXISTS<subquery>... 
           LefftSemi Join: ...
       LeftSemi Join: ...
   ...
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to