2010YOUY01 commented on PR #17482: URL: https://github.com/apache/datafusion/pull/17482#issuecomment-3280088456
This is great! I have some suggestions for the planning part, and I'll review the execution part tomorrow. ### Refactor the in-equality extracting logic I suggest to move the inequality-extracting logic from `physical_planner.rs` into https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/extract_equijoin_predicate.rs The reason is we'd better put similar code into a single place, instead of let it scatter to multiple places. `ExtractEquijoinPredicate` logical optimizer rule is extracting equality join predicates like `t1.v1 = t2.v1`, here we want to extract `t1.v1 < t2.v1`, their logic should be very similar. To do this I think we need to extend the logical plan join node with extra ie predicate field (maybe we can define a new struct for IE predicate with `(Expr, Op, Expr)`, and we can also use that in other places) ``` /// Join two logical plans on one or more join columns #[derive(Debug, Clone, PartialEq, Eq, Hash)] pub struct Join { ... /// Equijoin clause expressed as pairs of (left, right) join expressions pub on: Vec<(Expr, Expr)>, /// In-equility clause expressed as pairs of (left, right) join expressions <-- HERE pub ie_predicates: Vec<(Expr, IEOp, Expr)>, /// Filters applied during join (non-equi conditions) pub filter: Option<Expr>, ... } ``` To make it compatible for systems only use the `LogicalPlan` API, but not the physical plans, we can also provide a utility to move the IE predicates back to the filter: ``` Before: ie_predicates: [t1.v1 < t2.v1, t1.v2 < t2.v2] filter: (t1.v3 + t2.v3) = 100 After: ie_predicates: [] filter: ((t1.v3 + t2.v3) = 100) AND (t1.v1 < t2.v1) AND (t1.v2 < t2.v2) ``` Perhaps we can open a PR only for this IE predicates extracting task, and during the initial planning we can simply move the IE predicates back to the filter with the above mentioned utility. ### Make it configurable to turn on/off PWMJ I'll try to finish https://github.com/apache/datafusion/pull/17467 soon to make it easier, so let's put this on hold for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
