ygf11 opened a new issue, #4389:
URL: https://github.com/apache/arrow-datafusion/issues/4389

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Join keys in `LogicalPlan::Join` are `columns` now, which is the same as 
physical plan.
   To add features base on it, we need add additional projections in logical 
plan level, like #4193 and #4353. There are some drawbacks:
   
   * Join keys may need alias when do type coercion, and the alias will display 
in the logical plan (also make display verbose).
   * Our logical plan will be verbose with a lot additional projections.
   * In optimizing stage, we have to check the additional projections if we 
want to optimize joins.
   
   ### Join key need alias when do type coercion ###
   One join key may happen one more time in join on condition, then it may also 
need do type coercion one more time, to distinguish each position, we have to 
add alias for them.
   For example, we have:
   ```sql
   --- t0.a t1.a -> UInt8
   --- t0.b t1.b -> Int16
   --- t0.c t1.c -> Int32
   select t0.a, t1.a from test0 as t0
                     inner join test1 as t1
                     on t0.a = t1.b and t0.a = t1.c      
   ```
   
   then the logical plan will:
   ```
   -- before optimizer
   Projection: t0.a, t1.a                                                       
                                                                                
                                          
    Inner Join: t0.a = t1.b, t0.a = t1.c                                        
                                                                                
                                     
     Projection: t0.a, t0.b, t0.c                                               
                                                                    
      SubqueryAlias: t0                                                         
                                                                                
                                       
       TableScan: test0 projection=[a,b,c]                                      
                                                                                
                                         
      SubqueryAlias: t1 
   
   -- after type coercion
   Projection: t0.a, t1.a                                                       
                                                                                
                                          
    Inner Join: t0.a#0 = t1.b, t0.a#1 = t1.c                                    
                                                                                
                                         
     Projection: t0.a, CAST(t0.a AS Int16) AS t0.a#0, CAST(t0.a AS Int32) AS 
t0.a#1                                                                          
                                           
      SubqueryAlias: t0                                                         
                                                                                
                                       
       TableScan: test0 projection=[a]                                          
                                                                                
                                     
      SubqueryAlias: t1                                                         
                                                                                
                                        
   ```
   
   This makes our logical plan more complex.
   
   ### logical plan is verbose ###
   for sql:
   ```sql
   select t0.a, t1.b
          from test0 as t0
          inner join test1 as t1
          on t0.a + 1 = t1.a * 2;
   ```
   the logical plan will be like(does not do type coercion):
   ```
   Projection: t0.a, t1.b                                                       
                                                                   
    Inner Join: t0.a + Int64(1) = t1.a * Int64(2)                               
                                                                         
     Projection: t0.a, CAST(t0.a AS Int64) + Int64(1)                           
                                                                       
      SubqueryAlias: t0                                                         
                                                                       
       TableScan: test0 projection=[a]                                          
                                                                     
     Projection: t1.b, CAST(t1.a AS Int64) * Int64(2)                           
                                                                        
      SubqueryAlias: t1                                                         
                                                                       
       TableScan: test1 projection=[a, b]                                       
                                                                      
   ```
   
   Two projections will be added before join.
   
   **Describe the solution you'd like**
   I would like to fold the additional projections to join(logical-plan), and 
keep the physical join as before. I think it can make our logical plan clean 
and easy to extend.
   ```rust
   /// Join two logical plans on one or more join columns
   #[derive(Clone)]
   pub struct Join {
       ...
       /// Equijoin clause expressed as pairs of (left, right) join columns
       pub on: Vec<(Expr,Expr)>,
      ...
   }
   ``` 
   
   After I check the current code base, the main changes are the optimizers:
   
   * eliminate_cross_join.
   * filter_null_join_key.
   * filter_push_down.
   * projection_pushdown.
   * subquery_filter_to_join.
   
   We can step by step to finish it, and refactor the LogicalPlan::Join finally.
   
   
   
   **Describe alternatives you've considered**
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to