epsio-banay opened a new issue, #7530:
URL: https://github.com/apache/arrow-datafusion/issues/7530

   ### Describe the bug
   
   This bug is very similar to #4844 - The EliminateCrossJoin rule discards the 
Filter rules of InnerJoin.
   The fix for the previous bug (#4869) was to skip the rule if an InnerJoin 
with a Filter was found.
   Unfortunately the fix only checks the top InnerJoin, so nested InnerJoins 
are not checked and might lose their filter condition 
   
   ### To Reproduce
   
   Add the below test to eliminate_cross_join.rs:
   
   ```
   #[test]
       fn eliminate_cross_not_possible_nested_inner_join_with_filter() -> 
Result<()> {
           let t1 = test_table_scan_with_name("t1")?;
           let t2 = test_table_scan_with_name("t2")?;
           let t3 = test_table_scan_with_name("t3")?;
   
           // could not eliminate to inner join with filter
           let plan = LogicalPlanBuilder::from(t1)
               .join(t3, JoinType::Inner, (vec!["t1.a"], vec!["t2.a"]),
                     Some(col("t1.a").gt(lit(20u32))))?
               .join(t2, JoinType::Inner, (vec!["t1.a"], vec!["t3.a"]), None)?
               .filter(col("t1.a").lt(lit(15u32)))?
               .build()?;
   
           let expected = vec![
               "Filter: t1.a < UInt32(15) [a:UInt32, b:UInt32, c:UInt32, 
a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
               "  Inner Join: t1.a = t3.a [a:UInt32, b:UInt32, c:UInt32, 
a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
               "    Inner Join: t1.a = t2.a Filter: t1.a > UInt32(20) 
[a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
               "      TableScan: t1 [a:UInt32, b:UInt32, c:UInt32]",
               "      TableScan: t3 [a:UInt32, b:UInt32, c:UInt32]",
               "    TableScan: t2 [a:UInt32, b:UInt32, c:UInt32]",
           ];
   
           let formatted = plan.display_indent_schema().to_string();
           let actual: Vec<&str> = formatted.trim().lines().collect();
   
           assert_eq!(
               expected, actual,
               "\n\nexpected:\n\n{expected:#?}\nactual:\n\n{actual:#?}\n\n"
           );
   
           assert_optimized_plan_eq(&plan, expected);
   
           Ok(())
       }
   ```
   
   The test will fail in `assert_optimized_plan_eq` because after the rule is 
applied the plan will be:
   ```
   "Filter: t1.a < UInt32(15) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, 
b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
   "  Inner Join: t1.a = t2.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, 
b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]",
   "    Inner Join: t1.a = t3.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, 
b:UInt32, c:UInt32]",
   "      TableScan: t1 [a:UInt32, b:UInt32, c:UInt32]",
   "      TableScan: t3 [a:UInt32, b:UInt32, c:UInt32]",
   "    TableScan: t2 [a:UInt32, b:UInt32, c:UInt32]",
   ```
   
   Meaning the nested Inner join filter was incorrectly dismissed (t1.a > 
UInt32(20))
   
   ### Expected behavior
   
   In case the InnerJoin has a Filter the rule should be skipped or optimized 
correctly
   
   ### Additional context
   
   #4866 will probably fix this bug but I think a quicker fix is needed to be 
applied here in order to regain query results correctness.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to