nseekhao commented on code in PR #7612:
URL: https://github.com/apache/arrow-datafusion/pull/7612#discussion_r1333340499
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -351,41 +362,62 @@ pub async fn from_substrait_rel(
None => None,
};
// If join expression exists, parse the `on` condition expression,
build join and return
- // Otherwise, build join with koin filter, without join keys
+ // Otherwise, build join with only the filter, without join keys
match &join.expression.as_ref() {
Some(expr) => {
let on =
from_substrait_rex(expr, &in_join_schema,
extensions).await?;
let predicates = split_conjunction(&on);
// TODO: collect only one null_eq_null
- let join_exprs: Vec<(Column, Column, bool)> = predicates
- .iter()
- .map(|p| match p {
+ // The predicates can contain both equal and non-equal ops.
Review Comment:
@Dandandan Thank you for pointing this out. I agree that this makes things
more complicated, and my apologies for the inaccurate explanation of the issue.
I added a comment to correct my description of the
[issue](https://github.com/apache/arrow-datafusion/issues/7611#issuecomment-1729906548).
The high-level idea is that join `filter` and `post_join_filter` do not have
the same meaning semantically. The former is for filtering input during/pre
join, the latter is for filtering the output of the join (post-join). Currently
in `datafusion`, we do not have a field in join relation that represents a
post-join predicate (the parser/logical optimizer takes care of creating an
appropriate filter relation if necessary). So the producer should only generate
plans with `None` as `post_join_filer`.
I'll modify the consumer to throw an error for now if there's a
`post_join_filter`. Later, we can wrap the join relation with a filter relation
if we want to support `post_join_filter`, or if the later version of
`datafusion` supports `post_join_filter` directly in the join relation, then we
can also add the support in both producer and consumer as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]