alamb commented on pull request #7880: URL: https://github.com/apache/arrow/pull/7880#issuecomment-674429867
I have had this nagging sensation that the algorithm isn't quite right when the same column is used multiple times. I finally came up with an example that shows part of what I have been worrying about. Here is a new test that passes on this branch but I think is incorrect. Specifically, with two `Selection`s for the same variable separated by a `Limit`, one of the `Selection`s is lost with the algorithm as written. Am I missing something? ``` #[test] fn filter_2_breaks_limits() -> Result<()> { let table_scan = test_table_scan()?; let plan = LogicalPlanBuilder::from(&table_scan) .project(vec![col("a")])? .filter(col("a").lt_eq(&Expr::Literal(ScalarValue::Int64(1))))? .limit(1)? .project(vec![col("a")])? .filter(col("a").gt_eq(&Expr::Literal(ScalarValue::Int64(1))))? .build()?; // Should be able to move both filters below the projections // not part of the test assert_eq!( format!("{:?}", plan), "Selection: #a GtEq Int64(1)\ \n Projection: #a\ \n Limit: 1\ \n Selection: #a LtEq Int64(1)\ \n Projection: #a\ \n TableScan: test projection=None" ); // This just seems wong: we lost a selection.... let expected = "\ Projection: #a\ \n Selection: #a GtEq Int64(1)\ \n Limit: 1\ \n Projection: #a\ \n TableScan: test projection=None"; assert_optimized_plan_eq(&plan, expected); Ok(()) } ``` FYI @jorgecarleitao ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org