adriangb commented on issue #20324:
URL: https://github.com/apache/datafusion/issues/20324#issuecomment-3893869335

   I've been able to reproduce something ~ similar:
   
   ```sql
   set datafusion.execution.parquet.binary_as_string = true;
   set datafusion.execution.parquet.pushdown_filters = false;
   
   create external table hits stored as parquet location 
'benchmarks/data/hits_partitioned';
   CREATE VIEW hits_view AS SELECT "EventDate" + 0 as "EventDate" FROM hits;
   
   set datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown = false;
   SELECT MIN("EventDate"), MAX("EventDate") FROM hits_view where "EventDate" > 
0 AND "EventDate" < 99999999;
   
   set datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown = true;
   SELECT MIN("EventDate"), MAX("EventDate") FROM hits_view;
   ```
   
   ```
   +--------------------------+--------------------------+
   | min(hits_view.EventDate) | max(hits_view.EventDate) |
   +--------------------------+--------------------------+
   | 15888                    | 15917                    |
   +--------------------------+--------------------------+
   1 row(s) fetched. 
   Elapsed 0.026 seconds.
   
   0 row(s) fetched. 
   Elapsed 0.000 seconds.
   
   +--------------------------+--------------------------+
   | min(hits_view.EventDate) | max(hits_view.EventDate) |
   +--------------------------+--------------------------+
   | 15888                    | 15917                    |
   +--------------------------+--------------------------+
   1 row(s) fetched. 
   Elapsed 0.135 seconds.
   ```
   
   The filter for the second query is `predicate=DynamicFilter [ 
CAST(EventDate@5 AS Int64) + 0 < 15888 OR CAST(EventDate@5 AS Int64) + 0 > 
15917 ]`
   
   The `cast()` aside (which I think will be optimized away / I don't even know 
why it's there, the column is Int64 to begin with) I don't know why the dynamic 
filter version would be slower than the version with an arbitrary hard-coded 
filter. Maybe there's a lot of overhead to the dynamic filter wrapper?
   
   But also: this is all *without* turning predicate pushdown on. So it seems 
that just having the aggregate dynamic filters on causes a regression for this 
query. cc @2010YOUY01. But then that means this also doesn't explain why the 
query gets slower with predicate pushdown turned on 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to