askoa commented on issue #4795:
URL: 
https://github.com/apache/arrow-datafusion/issues/4795#issuecomment-1380647966

   I found part of the problem. The rule `common_sub_expression_eliminate` 
swallows one of the `EXIST (<subquery>)`. Below are detailed steps
   
   logical plan before `decorrelate_where_exists` -> `... EXISTS(<subquery>) 
AND (EXISTS(<subquery>) OR EXISTS(<subquery>)) .... subquery: ... subquery: ... 
subquery: ...`
   
   the rule `decorrelate_where_exists` mutates the first `EXIST` into a 
`LeftSemi Join`
   
   logical plan after `decorrelate_where_exists ` and before 
`common_sub_expression_eliminate` -> `...(EXISTS(<subquery>) OR 
EXISTS(<subquery>))  ... subquery: ... subquery: ... LeftSemi Join: ...`
   
   the rule `common_sub_expression_eliminate` runs:
     - The rule maintains a HashMap of expressions to no. of occurrences. The 
expression `EXISTS(<subquery>)` is recognized as a single expression occurring 
twice.
   
   
https://github.com/apache/arrow-datafusion/blob/1844d39eb92f04e483095f491ff07da3a2f67f25/datafusion/optimizer/src/common_subexpr_eliminate.rs#L451-L454
   
     - The rule creates a projection that merges the two similar expressions 
and thus making the second subquery irrelevant.
   
   logical plan after `common_sub_expression_eliminate`:  
`...(EXISTS(<subquery>) OR EXISTS(<subquery>))  ... Projection: 
EXISTS<subquery>... subquery: ... LeftSemi Join: ...`
   
   As I am new to both sql optimization and to data fusion I would like to use 
this thread to discuss potential solutions. I assume adding unique identifiers 
or aliases to subqueries might be part of the solution. Adding unique 
identifiers or aliases will stop `common_sub_expression_elimitate` treating 
both subqueries as the same. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to