epsio-banay opened a new issue, #7530:
URL: https://github.com/apache/arrow-datafusion/issues/7530
### Describe the bug
This bug is very similar to #4844 - The EliminateCrossJoin rule discards the
Filter rules of InnerJoin.
The fix for the previous bug (#4869) was to skip the rule if an InnerJoin
with a Filter was found.
Unfortunately the fix only checks the top InnerJoin, so nested InnerJoins
are not checked and might lose their filter condition
### To Reproduce
Add the below test to eliminate_cross_join.rs:
```
#[test]
fn eliminate_cross_not_possible_nested_inner_join_with_filter() ->
Result<()> {
let t1 = test_table_scan_with_name("t1")?;
let t2 = test_table_scan_with_name("t2")?;
let t3 = test_table_scan_with_name("t3")?;
// could not eliminate to inner join with filter
let plan = LogicalPlanBuilder::from(t1)
.join(t3, JoinType::Inner, (vec!["t1.a"], vec!["t2.a"]),
Some(col("t1.a").gt(lit(20u32))))?
.join(t2, JoinType::Inner, (vec!["t1.a"], vec!["t3.a"]), None)?
.filter(col("t1.a").lt(lit(15u32)))?
.build()?;
let expected = vec![
"Filter: t1.a < UInt32(15) [a:UInt32, b:UInt32, c:UInt32,
a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" Inner Join: t1.a = t3.a [a:UInt32, b:UInt32, c:UInt32,
a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" Inner Join: t1.a = t2.a Filter: t1.a > UInt32(20)
[a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" TableScan: t1 [a:UInt32, b:UInt32, c:UInt32]",
" TableScan: t3 [a:UInt32, b:UInt32, c:UInt32]",
" TableScan: t2 [a:UInt32, b:UInt32, c:UInt32]",
];
let formatted = plan.display_indent_schema().to_string();
let actual: Vec<&str> = formatted.trim().lines().collect();
assert_eq!(
expected, actual,
"\n\nexpected:\n\n{expected:#?}\nactual:\n\n{actual:#?}\n\n"
);
assert_optimized_plan_eq(&plan, expected);
Ok(())
}
```
The test will fail in `assert_optimized_plan_eq` because after the rule is
applied the plan will be:
```
"Filter: t1.a < UInt32(15) [a:UInt32, b:UInt32, c:UInt32, a:UInt32,
b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" Inner Join: t1.a = t2.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32,
b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
" Inner Join: t1.a = t3.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32,
b:UInt32, c:UInt32]",
" TableScan: t1 [a:UInt32, b:UInt32, c:UInt32]",
" TableScan: t3 [a:UInt32, b:UInt32, c:UInt32]",
" TableScan: t2 [a:UInt32, b:UInt32, c:UInt32]",
```
Meaning the nested Inner join filter was incorrectly dismissed (t1.a >
UInt32(20))
### Expected behavior
In case the InnerJoin has a Filter the rule should be skipped or optimized
correctly
### Additional context
#4866 will probably fix this bug but I think a quicker fix is needed to be
applied here in order to regain query results correctness.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]