eejbyfeldt commented on PR #12943:
URL: https://github.com/apache/datafusion/pull/12943#issuecomment-2444098388

   > This is unexpected to me, as my intuition assumes that the WHERE clause 
always filters the pre-aggregated data (i.e. it is always pushed down)
   
   I don't think "always pushed down" is the correct description here. My 
understanding is that it would be planned as a filter before the aggregation 
and therefore would not needed to be pushed down. In DataFusion this can be 
seen if we do 
   ```
   > EXPLAIN VERBOSE select distinct on (a) a, b from foo where b = 1 order by 
a, b desc;
   
   
+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type                                                  | plan          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                       |
   
+------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | initial_logical_plan                                       | DistinctOn: 
on_expr=[[foo.a]], select_expr=[[foo.a, foo.b]], sort_expr=[[foo.a ASC NULLS 
FIRST, foo.b DESC NULLS LAST]]                                                  
                                                                                
                                                                                
                                                                                
                                                                                
                            |
   |                                                            |   Filter: 
foo.b = Int64(1)                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                           |
   |                                                            |     
TableScan: foo                              
   ...
   ```
   
   
   To get the filter above the aggregation (in postgres) we can do
   ```
   > explain select * from (select distinct on (a) a, b from foo) t where b = 1 
order by a, b desc;
                                  QUERY PLAN                                
   -------------------------------------------------------------------------
    Subquery Scan on t  (cost=158.51..172.31 rows=1 width=8)
      Filter: (t.b = 1)
      ->  Unique  (cost=158.51..169.81 rows=200 width=8)
            ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
                  Sort Key: foo.a
                  ->  Seq Scan on foo  (cost=0.00..32.60 rows=2260 width=8)
   ```
   e.g not pushed down and if we change the filter to `a = 1` we get
   ```
   > explain select * from (select distinct on (a) a, b from foo) t where a = 1 
order by a, b desc;
                              QUERY PLAN                            
   -----------------------------------------------------------------
    Sort  (cost=38.55..38.58 rows=11 width=8)
      Sort Key: foo.b DESC
      ->  Unique  (cost=0.00..38.25 rows=11 width=8)
            ->  Seq Scan on foo  (cost=0.00..38.25 rows=11 width=8)
                  Filter: (a = 1)
   ```
   so this filter gets pushed down.
   
   So postgres seems to agree with us that the filter can be pushed down in 
some case and not in others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to