alamb commented on code in PR #7612:
URL: https://github.com/apache/arrow-datafusion/pull/7612#discussion_r1336355393


##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -331,6 +342,12 @@ pub async fn from_substrait_rel(
             }
         }
         Some(RelType::Join(join)) => {
+            if join.post_join_filter.is_some() {

Review Comment:
   👍 



##########
datafusion/substrait/src/logical_plan/producer.rs:
##########
@@ -277,30 +277,47 @@ pub fn to_substrait_rel(
             // parse filter if exists
             let in_join_schema = join.left.schema().join(join.right.schema())?;
             let join_filter = match &join.filter {
-                Some(filter) => Some(Box::new(to_substrait_rex(
+                Some(filter) => Some(to_substrait_rex(
                     filter,
                     &Arc::new(in_join_schema),
                     0,
                     extension_info,
-                )?)),
+                )?),
                 None => None,
             };
+
             // map the left and right columns to binary expressions in the 
form `l = r`
             // build a single expression for the ON condition, such as `l.a = 
r.a AND l.b = r.b`
             let eq_op = if join.null_equals_null {
                 Operator::IsNotDistinctFrom
             } else {
                 Operator::Eq
             };
-
-            let join_expr = to_substrait_join_expr(
+            let join_on = to_substrait_join_expr(
                 &join.on,
                 eq_op,
                 join.left.schema(),
                 join.right.schema(),
                 extension_info,
-            )?
-            .map(Box::new);
+            )?;
+
+            // create conjunction between `join_on` and `join_filter` to embed 
all join conditions,
+            // whether equal or non-equal in a single expression
+            let join_expr = match &join_on {

Review Comment:
   Perhaps you could use `conjunction` here and simplify the code 
https://docs.rs/datafusion/latest/datafusion/optimizer/utils/fn.conjunction.html



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -331,6 +342,12 @@ pub async fn from_substrait_rel(
             }
         }
         Some(RelType::Join(join)) => {
+            if join.post_join_filter.is_some() {

Review Comment:
   👍 



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -351,41 +362,62 @@ pub async fn from_substrait_rel(
                 None => None,
             };
             // If join expression exists, parse the `on` condition expression, 
build join and return
-            // Otherwise, build join with koin filter, without join keys
+            // Otherwise, build join with only the filter, without join keys
             match &join.expression.as_ref() {
                 Some(expr) => {
                     let on =
                         from_substrait_rex(expr, &in_join_schema, 
extensions).await?;
                     let predicates = split_conjunction(&on);
                     // TODO: collect only one null_eq_null
-                    let join_exprs: Vec<(Column, Column, bool)> = predicates
-                        .iter()
-                        .map(|p| match p {
+                    // The predicates can contain both equal and non-equal ops.

Review Comment:
   I looked into the substrait spec, and it doesn't really talk about what 
semantics of the `post_join_filter` is: 
https://substrait.io/relations/logical_relations/#join-operation
   
https://github.com/search?q=repo%3Asubstrait-io%2Fsubstrait%20post_join_filter&type=code
   
   There is a subtle distinction between non equality filters applied *during* 
the join (in the `ON` clause) and applied post join for non-INNER joins: for 
non inner joins the filters during the join don't filter out input rows (they 
still come out, just with NULL matches)
   
   > So the producer should only generate plans with None as post_join_filer. 
   
   This makes sense to me
   
   For the consumer, there is already code in DataFusion that breaks up an 
arbitrary `Expr` into equality predicates and others. This is how the SQL 
frontend creates a Join (a single expr):
   
   
https://github.com/apache/arrow-datafusion/blob/a514b6752b063a5a3006aa114297520a933339b0/datafusion/sql/src/relation/join.rs#L134-L141
   
   I think we could do the same here in the subtrait consumer which would be 
much simpler, and would let the normal DataFusion optimization machinery work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to