eejbyfeldt commented on PR #12943: URL: https://github.com/apache/datafusion/pull/12943#issuecomment-2444098388
> This is unexpected to me, as my intuition assumes that the WHERE clause always filters the pre-aggregated data (i.e. it is always pushed down) I don't think "always pushed down" is the correct description here. My understanding is that it would be planned as a filter before the aggregation and therefore would not needed to be pushed down. In DataFusion this can be seen if we do ``` > EXPLAIN VERBOSE select distinct on (a) a, b from foo where b = 1 order by a, b desc; +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | initial_logical_plan | DistinctOn: on_expr=[[foo.a]], select_expr=[[foo.a, foo.b]], sort_expr=[[foo.a ASC NULLS FIRST, foo.b DESC NULLS LAST]] | | | Filter: foo.b = Int64(1) | | | TableScan: foo ... ``` To get the filter above the aggregation (in postgres) we can do ``` > explain select * from (select distinct on (a) a, b from foo) t where b = 1 order by a, b desc; QUERY PLAN ------------------------------------------------------------------------- Subquery Scan on t (cost=158.51..172.31 rows=1 width=8) Filter: (t.b = 1) -> Unique (cost=158.51..169.81 rows=200 width=8) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: foo.a -> Seq Scan on foo (cost=0.00..32.60 rows=2260 width=8) ``` e.g not pushed down and if we change the filter to `a = 1` we get ``` > explain select * from (select distinct on (a) a, b from foo) t where a = 1 order by a, b desc; QUERY PLAN ----------------------------------------------------------------- Sort (cost=38.55..38.58 rows=11 width=8) Sort Key: foo.b DESC -> Unique (cost=0.00..38.25 rows=11 width=8) -> Seq Scan on foo (cost=0.00..38.25 rows=11 width=8) Filter: (a = 1) ``` so this filter gets pushed down. So postgres seems to agree with us that the filter can be pushed down in some case and not in others. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org