alamb commented on issue #1116: URL: https://github.com/apache/datafusion/issues/1116#issuecomment-2656521042
When exact pushdown is enabled, there is no filter being applied in
DataFusion anymore 🎉
Specifically, there is no `FilterExec` in this plan:
```sql
> set datafusion.execution.parquet.pushdown_filters = true;
0 row(s) fetched.
Elapsed 0.000 seconds.
> explain select
count(*) as total_records,
"VendorID" as vid,
sum(cast(trip_distance as float)) as total_distance
from 'nyc_taxi.parquet'
where
passenger_count > 3 and
cast(trip_distance as float) < 5.00 and
fare_amount / (total_amount + 0.001) > 0.70 and
total_amount < 20.0 and
passenger_count is not null and
-passenger_count < -2 and
"VendorID" in ('2', '4')
group by "VendorID"
order by vid
limit 100;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: vid ASC NULLS LAST, fetch=100
|
| | Projection: count(*) AS total_records,
nyc_taxi.parquet.VendorID AS vid, sum(nyc_taxi.parquet.trip_distance) AS
total_distance
|
| | Aggregate: groupBy=[[nyc_taxi.parquet.VendorID]],
aggr=[[count(Int64(1)) AS count(*),
sum(CAST(CAST(nyc_taxi.parquet.trip_distance AS Float32) AS Float64))]]
|
| | TableScan: nyc_taxi.parquet projection=[VendorID,
trip_distance], full_filters=[nyc_taxi.parquet.passenger_count > Int64(3),
CAST(CAST(nyc_taxi.parquet.trip_distance AS Float32) AS Float64) < Float64(5),
nyc_taxi.parquet.fare_amount / (nyc_taxi.parquet.total_amount + Float64(0.001))
> Float64(0.7), nyc_taxi.parquet.total_amount < Float64(20),
nyc_taxi.parquet.passenger_count IS NOT NULL, (-
nyc_taxi.parquet.passenger_count) < Int64(-2), CAST(nyc_taxi.parquet.VendorID
AS Utf8) IN ([Utf8("2"), Utf8("4")])]
|
| physical_plan | SortPreservingMergeExec: [vid@1 ASC NULLS LAST], fetch=100
|
| | SortExec: TopK(fetch=100), expr=[vid@1 ASC NULLS LAST],
preserve_partitioning=[true]
|
| | ProjectionExec: expr=[count(*)@1 as total_records,
VendorID@0 as vid, sum(nyc_taxi.parquet.trip_distance)@2 as total_distance]
|
| | AggregateExec: mode=FinalPartitioned,
gby=[VendorID@0 as VendorID], aggr=[count(*),
sum(nyc_taxi.parquet.trip_distance)]
|
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec: partitioning=Hash([VendorID@0],
16), input_partitions=16
|
| | AggregateExec: mode=Partial, gby=[VendorID@0
as VendorID], aggr=[count(*), sum(nyc_taxi.parquet.trip_distance)]
|
| | DataSourceExec: file_groups={16 groups:
[[Users/andrewlamb/Downloads/nyc_taxi.parquet:0..3122603],
[Users/andrewlamb/Downloads/nyc_taxi.parquet:3122603..6245206],
[Users/andrewlamb/Downloads/nyc_taxi.parquet:6245206..9367809],
[Users/andrewlamb/Downloads/nyc_taxi.parquet:9367809..12490412],
[Users/andrewlamb/Downloads/nyc_taxi.parquet:12490412..15613015], ...]},
projection=[VendorID, trip_distance], file_type=parquet,
predicate=passenger_count@3 > 3 AND CAST(CAST(trip_distance@4 AS Float32) AS
Float64) < 5 AND fare_amount@10 / (total_amount@16 + 0.001) > 0.7 AND
total_amount@16 < 20 AND passenger_count@3 IS NOT NULL AND (-
passenger_count@3) < -2 AND Use CAST(VendorID@0 AS Utf8) IN (SET) ([Literal {
value: Utf8("2") }, Literal { value: Utf8("4") }]),
pruning_predicate=passenger_count_null_count@1 != row_count@2 AND
passenger_count_max@0 > 3 AND total_amount_null_count@4 != row_count@2 AND
total_amount_min@3 < 20 AND passenger_count_null_count@1 != r
ow_count@2 AND passenger_count_null_count@1 != row_count@2 AND
passenger_count_max@0 > (- -2), required_guarantees=[] |
| |
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.011 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
