2010YOUY01 commented on PR #17482:
URL: https://github.com/apache/datafusion/pull/17482#issuecomment-3280088456

   This is great! I have some suggestions for the planning part, and I'll 
review the execution part tomorrow.
   
   ### Refactor the in-equality extracting logic
   
   I suggest to move the inequality-extracting logic from `physical_planner.rs` 
into 
https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/extract_equijoin_predicate.rs
   
   The reason is we'd better put similar code into a single place, instead of 
let it scatter to multiple places. `ExtractEquijoinPredicate` logical optimizer 
rule is extracting equality join predicates like `t1.v1 = t2.v1`, here we want 
to extract `t1.v1 < t2.v1`, their logic should be very similar.
   
   To do this I think we need to extend the logical plan join node with extra 
ie predicate field (maybe we can define a new struct for IE predicate with 
`(Expr, Op, Expr)`, and we can also use that in other places)
   ```
   /// Join two logical plans on one or more join columns
   #[derive(Debug, Clone, PartialEq, Eq, Hash)]
   pub struct Join {
       ...
       /// Equijoin clause expressed as pairs of (left, right) join expressions
       pub on: Vec<(Expr, Expr)>,                                               
                  
       /// In-equility clause expressed as pairs of (left, right) join 
expressions           <-- HERE
       pub ie_predicates: Vec<(Expr, IEOp, Expr)>,
       /// Filters applied during join (non-equi conditions)
       pub filter: Option<Expr>,
       ...
   }
   ```
   
   To make it compatible for systems only use the `LogicalPlan` API, but not 
the physical plans, we can also provide a utility to move the IE predicates 
back to the filter:
   
   ```
   Before: 
   ie_predicates: [t1.v1 < t2.v1, t1.v2 < t2.v2]
   filter: (t1.v3 + t2.v3) = 100
   
   After:
   ie_predicates: []
   filter: ((t1.v3 + t2.v3) = 100) AND (t1.v1 < t2.v1) AND (t1.v2 < t2.v2)
   ```
   
   Perhaps we can open a PR only for this IE predicates extracting task, and 
during the initial planning we can simply move the IE predicates back to the 
filter  with the above mentioned utility.
   
   ### Make it configurable to turn on/off PWMJ
   I'll try to finish https://github.com/apache/datafusion/pull/17467 soon to 
make it easier, so let's put this on hold for now.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to