xudong963 edited a comment on pull request #1566:
URL: 
https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1013877492


   > Could you possibly provide some tests @xudong963 ?
   
   Sure. The resulting correctness test can be overridden by the current test. 
I can add a test about the logical plan.
   
   
   > I was expecting to see code that basically applied an algebraic 
transformation on predicates like:
   
   The ticket doesn't do the transformation.  It does the following thing.
   
   First of all, let's see the example:
   ```
   ❯ create table part as select 1 as p_partkey;                                
                                                                                
                                
   0 rows in set. Query took 0.003 seconds.                                     
                 
   ❯ create table lineitem as select 1 as l_partkey, 2 as l_suppkey;            
                 
   0 rows in set. Query took 0.005 seconds.                                     
                 
   ❯ create table supplier as select 1 as s_suppkey;                            
                 
   0 rows in set. Query took 0.002 seconds.                                     
                                                           
   ❯ explain select * from part, supplier, lineitem where p_partkey = l_partkey 
and s_suppkey = l_suppkey;                                                      
                                
   
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   --------------------------------------------------+                       
   | plan_type     | plan                                                       
                                                                                
                                                                                
  |               
   
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   --------------------------------------------------+  
   | logical_plan  | Projection: #part.p_partkey, #supplier.s_suppkey, 
#lineitem.l_partkey, #lineitem.l_suppkey                                        
                                         
                                                     | 
   |               |   Join: #part.p_partkey = #lineitem.l_partkey, 
#supplier.s_suppkey = #lineitem.l_suppkey                                       
                                            
                                                     |                          
                 
   |               |     CrossJoin:                                             
                                                                                
                                                                                
  |                          
   |               |       TableScan: part projection=Some([0])                 
                 
                                                     |                          
                                                                                
                                
   |               |       TableScan: supplier projection=Some([0])             
                                                                                
                                
                                                     |                          
                                                                                
                                
   |               |     TableScan: lineitem projection=Some([0, 1])      
   ```
   
https://github.com/apache/arrow-datafusion/blob/6f7b2d25fb75c843efed67fbd72d09b2c2d6c2eb/datafusion/src/sql/planner.rs#L718
   In the `for` loop, at first, `left` is `part`, `right` is `supplier`, there 
is no `join key` between `part` and `right`, so result in `cross join` between 
`part` and `right`. It's heavy.
   
   In the ticket, we can push `supplier` to `mut_plans`, after inner join 
`part` and `lineitem`, `supplier` can inner join with them.
   ```
   
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                                                                
        |
   
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: #part.p_partkey, #lineitem.l_partkey, 
#lineitem.l_suppkey, #supplier.s_suppkey                                        
                 |
   |               |   Join: #lineitem.l_suppkey = #supplier.s_suppkey          
                                                                                
        |
   |               |     Join: #part.p_partkey = #lineitem.l_partkey            
                                                                                
        |
   |               |       TableScan: part projection=Some([0])                 
                                                                                
        |
   |               |       TableScan: lineitem projection=Some([0, 1])          
                                                                                
        |
   |               |     TableScan: supplier projection=Some([0])   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to