sgrebnov commented on PR #13496: URL: https://github.com/apache/datafusion/pull/13496#issuecomment-2492691910
@jonathanc-n, @goldmedal - I've reviewed this change and it brings back the following issue (there is additional context of why filtering added this way produces incorrect result) https://github.com/apache/datafusion/pull/13132 . I really like the change but can we improve this to see if we can wrap TableScan with Filter as a subquery when it is required Example query. ## Original query / LogicalPlan / Result ``` select c_custkey, count(o_orderkey) from customer left join orders on c_custkey = o_custkey and o_comment not like '%special%requests%' group by c_custkey ``` ```console | | Projection: customer.c_custkey, count(orders.o_orderkey) | | | Aggregate: groupBy=[[customer.c_custkey]], aggr=[[count(orders.o_orderkey)]] | | | Left Join: Filter: customer.c_custkey = orders.o_custkey | | | TableScan: customer projection=[c_custkey] | | | TableScan: orders projection=[o_orderkey, o_custkey], full_filters=[orders.o_comment NOT LIKE Utf8("%special%requests%")] ``` Result: ``` 1489 29 1269 0 652 24 273 0 51 0 ... ``` ## Existing unparser (main) ``` select "customer"."c_custkey", count("orders"."o_orderkey") from "customer" left join "orders" on (("customer"."c_custkey" = "orders"."o_custkey") and "orders"."o_comment" not like '%special%requests%') group by "customer"."c_custkey" ``` Result: ``` 1489 29 1269 0 652 24 273 0 51 0 ... ``` ## Proposed change ``` select "customer"."c_custkey", count("orders"."o_orderkey") from "customer" left join "orders" on ("customer"."c_custkey" = "orders"."o_custkey") where "orders"."o_comment" not like '%special%requests%' group by "customer"."c_custkey" ``` Result ``` 1489 29 652 24 1091 1 70 15 839 14 ``` If intent for filter to be moved it must be wrapped as subquery in this case: ``` select c_custkey, count(o_orderkey) from customer left join (select * from orders where o_comment not like '%special%requests%') on c_custkey = o_custkey group by c_custkey ``` Result ``` 1489 29 1269 0 652 24 273 0 51 0 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org