Jefffrey commented on issue #4844:
URL: 
https://github.com/apache/arrow-datafusion/issues/4844#issuecomment-1374664702

   Looks to be regression introduced by 
fddb3d3651041f41d66a801f10e27387e84374f7 
(https://github.com/apache/arrow-datafusion/pull/4562)
   
   On the commit prior to it (27921135e4ff4b644251db6ab42f1a25bd6523cb), I get 
this explain plan:
   
   ```sql
   
+------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type                                                  | plan          
                                                                                
                                                                          |
   
+------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | initial_logical_plan                                       | Sort: s.mark 
DESC NULLS FIRST                                                                
                                                                           |
   |                                                            |   Projection: 
s.name, s.mark, g.grade                                                         
                                                                          |
   |                                                            |     Filter: 
g.grade > Int64(2)                                                              
                                                                            |
   |                                                            |       Filter: 
s.mark BETWEEN g.min AND g.max                                                  
                                                                          |
   |                                                            |         
CrossJoin:                                                                      
                                                                                
|
   |                                                            |           
SubqueryAlias: s                                                                
                                                                              |
   |                                                            |             
TableScan: students                                                             
                                                                            |
   |                                                            |           
SubqueryAlias: g                                                                
                                                                              |
   |                                                            |             
TableScan: grades                                                               
                                                                            |
   | logical_plan after inline_table_scan                       | SAME TEXT AS 
ABOVE                                                                           
                                                                           |
   | logical_plan after type_coercion                           | SAME TEXT AS 
ABOVE                                                                           
                                                                           |
   | logical_plan after simplify_expressions                    | Sort: s.mark 
DESC NULLS FIRST                                                                
                                                                           |
   |                                                            |   Projection: 
s.name, s.mark, g.grade                                                         
                                                                          |
   |                                                            |     Filter: 
g.grade > Int64(2)                                                              
                                                                            |
   |                                                            |       Filter: 
s.mark >= g.min AND s.mark <= g.max                                             
                                                                          |
   |                                                            |         
CrossJoin:                                                                      
                                                                                
|
   |                                                            |           
SubqueryAlias: s                                                                
                                                                              |
   |                                                            |             
TableScan: students                                                             
                                                                            |
   |                                                            |           
SubqueryAlias: g                                                                
                                                                              |
   |                                                            |             
TableScan: grades                                                               
                                                                            |
   ...
   | logical_plan after eliminate_cross_join                    | SAME TEXT AS 
ABOVE                                                                           
                                                                           |
   ...
   | logical_plan after push_down_filter                        | Sort: s.mark 
DESC NULLS FIRST                                                                
                                                                           |
   |                                                            |   Projection: 
s.name, s.mark, g.grade                                                         
                                                                          |
   |                                                            |     Filter: 
s.mark >= g.min AND s.mark <= g.max                                             
                                                                            |
   |                                                            |       
CrossJoin:                                                                      
                                                                                
  |
   |                                                            |         
SubqueryAlias: s                                                                
                                                                                
|
   |                                                            |           
TableScan: students                                                             
                                                                              |
   |                                                            |         
SubqueryAlias: g                                                                
                                                                                
|
   |                                                            |           
Filter: grades.grade > Int64(2)                                                 
                                                                              |
   |                                                            |             
TableScan: grades, partial_filters=[grades.grade > Int64(2)]
   ...
   ```
   
   And on commit fddb3d3651041f41d66a801f10e27387e84374f7 I get this plan 
instead:
   
   ```sql
   
+------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type                                                  | plan          
                                                                                
                                                             |
   
+------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | initial_logical_plan                                       | Sort: s.mark 
DESC NULLS FIRST                                                                
                                                              |
   |                                                            |   Projection: 
s.name, s.mark, g.grade                                                         
                                                             |
   |                                                            |     Filter: 
g.grade > Int64(2)                                                              
                                                               |
   |                                                            |       Inner 
Join:  Filter: s.mark BETWEEN g.min AND g.max                                   
                                                               |
   |                                                            |         
SubqueryAlias: s                                                                
                                                                   |
   |                                                            |           
TableScan: students                                                             
                                                                 |
   |                                                            |         
SubqueryAlias: g                                                                
                                                                   |
   |                                                            |           
TableScan: grades                                                               
                                                                 |
   | logical_plan after inline_table_scan                       | SAME TEXT AS 
ABOVE                                                                           
                                                              |
   | logical_plan after type_coercion                           | SAME TEXT AS 
ABOVE                                                                           
                                                              |
   | logical_plan after simplify_expressions                    | Sort: s.mark 
DESC NULLS FIRST                                                                
                                                              |
   |                                                            |   Projection: 
s.name, s.mark, g.grade                                                         
                                                             |
   |                                                            |     Filter: 
g.grade > Int64(2)                                                              
                                                               |
   |                                                            |       Inner 
Join:  Filter: s.mark >= g.min AND s.mark <= g.max AS s.mark BETWEEN g.min AND 
g.max                                                           |
   |                                                            |         
SubqueryAlias: s                                                                
                                                                   |
   |                                                            |           
TableScan: students                                                             
                                                                 |
   |                                                            |         
SubqueryAlias: g                                                                
                                                                   |
   |                                                            |           
TableScan: grades                                                               
                                                                 |
   ...
   | logical_plan after eliminate_cross_join                    | Sort: s.mark 
DESC NULLS FIRST                                                                
                                                              |
   |                                                            |   Projection: 
s.name, s.mark, g.grade                                                         
                                                             |
   |                                                            |     Filter: 
g.grade > Int64(2)                                                              
                                                               |
   |                                                            |       
CrossJoin:                                                                      
                                                                     |
   |                                                            |         
SubqueryAlias: s                                                                
                                                                   |
   |                                                            |           
TableScan: students                                                             
                                                                 |
   |                                                            |         
SubqueryAlias: g                                                                
                                                                   |
   |                                                            |           
TableScan: grades                                                               
                                                                 |
   ...
   | logical_plan after push_down_filter                        | Sort: s.mark 
DESC NULLS FIRST                                                                
                                                              |
   |                                                            |   Projection: 
s.name, s.mark, g.grade                                                         
                                                             |
   |                                                            |     
CrossJoin:                                                                      
                                                                       |
   |                                                            |       
SubqueryAlias: s                                                                
                                                                     |
   |                                                            |         
TableScan: students                                                             
                                                                   |
   |                                                            |       
SubqueryAlias: g                                                                
                                                                     |
   |                                                            |         
Filter: grades.grade > Int64(2)                                                 
                                                                   |
   |                                                            |           
TableScan: grades, partial_filters=[grades.grade > Int64(2)]
   ```
   
   The actual regression seems to be caused by the SQL planner generating the 
initial logical plan with an Inner Join instead of a Cross Join, and this 
propagates down to cause the bug.
   
   However it seems to highlight the actual flaw which is in 
`eliminate_cross_join` optimizer rule which converts the plan from:
   
   ```sql
   Sort: s.mark DESC NULLS FIRST
     Projection: s.name, s.mark, g.grade
       Filter: g.grade > Int64(2)
         Inner Join:  Filter: s.mark >= g.min AND s.mark <= g.max AS s.mark 
BETWEEN g.min AND g.max
           SubqueryAlias: s
             TableScan: students
           SubqueryAlias: g
             TableScan: grades
   ```
   
   to
   
   ```sql
   Sort: s.mark DESC NULLS FIRST
     Projection: s.name, s.mark, g.grade
       Filter: g.grade > Int64(2)
         CrossJoin:
           SubqueryAlias: s
             TableScan: students
           SubqueryAlias: g
             TableScan: grades
   ```
   
   Where it completely discards the Filter on the Inner Join when converting it 
to a Cross Join, causing the buggy behaviour


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to