nuno-faria commented on PR #103: URL: https://github.com/apache/datafusion-site/pull/103#issuecomment-3262612288
> I just gave this a read through and think it's looking great! I'd like to
add a benchmark showing join performance numbers (@nuno-faria I think you had
something already, would you like to use it and we can include you as an
author?), otherwise I think this is g2g!
Yes I can share some ad-hoc tests, using a simple join query with TPC-H data
(sf=20). The ideal execution plan for the following query is to first filter
`customer` by `c_phone` and then use the resulting data to filter `orders`,
which is now possible with the dynamic filter pushdown:
```sql
EXPLAIN ANALYZE
SELECT *
FROM customer
JOIN orders on c_custkey = o_custkey
WHERE c_phone = '25-989-741-2988';
```
At `49.0.2`, the entire `orders` table is retrieved:
```sql
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=11, elapsed_compute=15.756209ms]
|
| | HashJoinExec: mode=Partitioned, join_type=Inner,
on=[(c_custkey@0, o_custkey@1)], metrics=[output_rows=11,
elapsed_compute=144.066417ms, build_input_batches=1, build_input_rows=1,
input_batches=3522, input_rows=30000000, output_batches=3811,
build_mem_used=1096, build_time=3.5554ms, join_time=137.192905ms]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=1, elapsed_compute=15.606µs]
|
| | RepartitionExec:
partitioning=Hash([c_custkey@0], 12), input_partitions=12,
metrics=[fetch_time=2.0014852s, repartition_time=17.211µs, send_time=7.143µs]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=1, elapsed_compute=33.404µs]
|
| | FilterExec: c_phone@4 = 25-989-741-2988,
metrics=[output_rows=1, elapsed_compute=28.9586ms]
|
| | DataSourceExec: file_groups={12 groups:
[[customer.parquet:0..20627576], [customer.parquet:20627576..41255152],
[customer.parquet:41255152..61882728], [customer.parquet:61882728..82510304],
[customer.parquet:82510304..103137880], ...]}, projection=[c_custkey, c_name,
c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment],
file_type=parquet, predicate=c_phone@4 = 25-989-741-2988,
pruning_predicate=c_phone_null_count@2 != row_count@3 AND c_phone_min@0 <=
25-989-741-2988 AND 25-989-741-2988 <= c_phone_max@1,
required_guarantees=[c_phone in (25-989-741-2988)]
|
| | , metrics=[output_rows=3000000, elapsed_compute=12ns,
batches_splitted=0, bytes_scanned=247499357, file_open_errors=0,
file_scan_errors=0, files_ranges_pruned_statistics=0,
num_predicate_creation_errors=0, page_index_rows_matched=0,
page_index_rows_pruned=0, predicate_evaluation_errors=0,
pushdown_rows_matched=0, pushdown_rows_pruned=0,
row_groups_matched_bloom_filter=0, row_groups_matched_statistics=25,
row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0,
bloom_filter_eval_time=503.212µs, metadata_load_time=36.498512ms,
page_index_eval_time=420ns, row_pushdown_eval_time=24ns,
statistics_eval_time=382.712µs, time_elapsed_opening=42.3593ms,
time_elapsed_processing=1.4428838s, time_elapsed_scanning_total=1.9582305s,
time_elapsed_scanning_until_data=901.7049ms]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=30000000, elapsed_compute=14.531075907s]
|
| | RepartitionExec:
partitioning=Hash([o_custkey@1], 12), input_partitions=12,
metrics=[fetch_time=10.0157432s, repartition_time=4.3250056s,
send_time=5.8247922s]
|
| | DataSourceExec: file_groups={12 groups:
[[orders.parquet:0..105903266], [orders.parquet:105903266..211806532],
[orders.parquet:211806532..317709798], [orders.parquet:317709798..423613064],
[orders.parquet:423613064..529516330], ...]}, projection=[o_orderkey,
o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk,
o_shippriority, o_comment], file_type=parquet, metrics=[output_rows=30000000,
elapsed_compute=12ns, batches_splitted=0, bytes_scanned=1269546359,
file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=0,
num_predicate_creation_errors=0, page_index_rows_matched=0,
page_index_rows_pruned=0, predicate_evaluation_errors=0,
pushdown_rows_matched=0, pushdown_rows_pruned=0,
row_groups_matched_bloom_filter=0, row_groups_matched_statistics=0,
row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0,
bloom_filter_eval_time=24ns, metadata_load_time=124.885412ms,
page_index_eval_time=24ns, row_pushdown_eval_
time=24ns, statistics_eval_time=24ns, time_elapsed_opening=127.657ms,
time_elapsed_processing=8.2648902s, time_elapsed_scanning_total=27.819907s,
time_elapsed_scanning_until_data=432.8593ms]
|
| |
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2.628s
output_rows from orders = 30M of 30M
```
At `50.0.0`, the dynamic filter will be able to cut down a large percentage
of the table:
```sql
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=11, elapsed_compute=1.219905ms]
|
| | HashJoinExec: mode=Partitioned, join_type=Inner,
on=[(c_custkey@0, o_custkey@1)], metrics=[output_rows=11,
elapsed_compute=47.499214ms, build_input_batches=1, build_input_rows=1,
input_batches=437, input_rows=3686400, output_batches=473, build_mem_used=1096,
build_time=19.4876ms, join_time=8.703002ms]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=1, elapsed_compute=13.004µs]
|
| | RepartitionExec:
partitioning=Hash([c_custkey@0], 12), input_partitions=12,
metrics=[fetch_time=1.9160576s, repartition_time=10.611µs, send_time=5.443µs]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=1, elapsed_compute=25.906µs]
|
| | FilterExec: c_phone@4 = 25-989-741-2988,
metrics=[output_rows=1, elapsed_compute=13.0737ms]
|
| | DataSourceExec: file_groups={12 groups:
[[customer.parquet:0..20627576], [customer.parquet:20627576..41255152],
[customer.parquet:41255152..61882728], [customer.parquet:61882728..82510304],
[customer.parquet:82510304..103137880], ...]}, projection=[c_custkey, c_name,
c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment],
file_type=parquet, predicate=c_phone@4 = 25-989-741-2988,
pruning_predicate=c_phone_null_count@2 != row_count@3 AND c_phone_min@0 <=
25-989-741-2988 AND 25-989-741-2988 <= c_phone_max@1,
required_guarantees=[c_phone in (25-989-741-2988)],
metrics=[output_rows=3000000, elapsed_compute=12ns, batches_split=0,
bytes_scanned=247499357, file_open_errors=0, file_scan_errors=0,
files_ranges_pruned_statistics=0, num_predicate_creation_errors=0,
page_index_rows_matched=0, page_index_rows_pruned=0,
predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0,
row_groups_matched_bloom_filter=0, row_groups_matc
hed_statistics=25, row_groups_pruned_bloom_filter=0,
row_groups_pruned_statistics=0, bloom_filter_eval_time=583.912µs,
metadata_load_time=4.968212ms, page_index_eval_time=420ns,
row_pushdown_eval_time=24ns, statistics_eval_time=491.812µs,
time_elapsed_opening=11.2265ms, time_elapsed_processing=1.4416089s,
time_elapsed_scanning_total=1.9039586s,
time_elapsed_scanning_until_data=894.5533ms]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=3686400, elapsed_compute=654.916706ms]
|
| | RepartitionExec:
partitioning=Hash([o_custkey@1], 12), input_partitions=12,
metrics=[fetch_time=786.0281ms, repartition_time=215.841606ms,
send_time=50.588072ms]
|
| | DataSourceExec: file_groups={12 groups:
[[orders.parquet:0..105903266], [orders.parquet:105903266..211806532],
[orders.parquet:211806532..317709798], [orders.parquet:317709798..423613064],
[orders.parquet:423613064..529516330], ...]}, projection=[o_orderkey,
o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk,
o_shippriority, o_comment], file_type=parquet,
predicate=DynamicFilterPhysicalExpr [ o_custkey@1 >= 1 AND o_custkey@1 <= 1 ],
pruning_predicate=o_custkey_null_count@1 != row_count@2 AND o_custkey_max@0 >=
1 AND o_custkey_null_count@1 != row_count@2 AND o_custkey_min@3 <= 1,
required_guarantees=[], metrics=[output_rows=3686400, elapsed_compute=12ns,
batches_split=0, bytes_scanned=155993981, file_open_errors=0,
file_scan_errors=0, files_ranges_pruned_statistics=0,
num_predicate_creation_errors=0, page_index_rows_matched=0,
page_index_rows_pruned=0, predicate_evaluation_errors=0,
pushdown_rows_matched=0, pushdown_rows_prun
ed=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=9,
row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=215,
bloom_filter_eval_time=189.513µs, metadata_load_time=178.664112ms,
page_index_eval_time=717ns, row_pushdown_eval_time=24ns,
statistics_eval_time=429.513µs, time_elapsed_opening=181.4318ms,
time_elapsed_processing=691.7683ms, time_elapsed_scanning_total=895.1489ms,
time_elapsed_scanning_until_data=137.6795ms]
|
| |
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
0.688s
output_rows from orders = 3.6M of 30M
```
However, the percentage of the table that is filtered out varies from
execution to execution, since the build and probe sides are built concurrently,
and sometimes more data is retrieved from the probe side before the dynamic
filter is complete (additional context:
https://github.com/apache/datafusion/pull/17197#issuecomment-3198010057). To
improve this, we can enable `datafusion.execution.parquet.pushdown_filters`:
```sql
set datafusion.execution.parquet.pushdown_filters = true;
```
Now, the dynamic filter consistently filters out the entire table, selecting
only the relevant rows:
```sql
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=11, elapsed_compute=6.905µs]
|
| | HashJoinExec: mode=Partitioned, join_type=Inner,
on=[(c_custkey@0, o_custkey@1)], metrics=[output_rows=11,
elapsed_compute=4.379914ms, build_input_batches=1, build_input_rows=1,
input_batches=1, input_rows=11, output_batches=1, build_mem_used=1096,
build_time=2.2824ms, join_time=32.702µs]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=1, elapsed_compute=20.806µs]
|
| | RepartitionExec:
partitioning=Hash([c_custkey@0], 12), input_partitions=12,
metrics=[fetch_time=245.4898ms, repartition_time=15.111µs, send_time=5.543µs]
|
| | DataSourceExec: file_groups={12 groups:
[[customer.parquet:0..20627576], [customer.parquet:20627576..41255152],
[customer.parquet:41255152..61882728], [customer.parquet:61882728..82510304],
[customer.parquet:82510304..103137880], ...]}, projection=[c_custkey, c_name,
c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment],
file_type=parquet, predicate=c_phone@4 = 25-989-741-2988,
pruning_predicate=c_phone_null_count@2 != row_count@3 AND c_phone_min@0 <=
25-989-741-2988 AND 25-989-741-2988 <= c_phone_max@1,
required_guarantees=[c_phone in (25-989-741-2988)], metrics=[output_rows=1,
elapsed_compute=12ns, batches_split=0, bytes_scanned=44262371,
file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=0,
num_predicate_creation_errors=0, page_index_rows_matched=0,
page_index_rows_pruned=0, predicate_evaluation_errors=0,
pushdown_rows_matched=1, pushdown_rows_pruned=2999999,
row_groups_matched_bloom_filter=0, row_groups_matched_s
tatistics=25, row_groups_pruned_bloom_filter=0,
row_groups_pruned_statistics=0, bloom_filter_eval_time=604.112µs,
metadata_load_time=3.295812ms, page_index_eval_time=520ns,
row_pushdown_eval_time=5.210212ms, statistics_eval_time=344.112µs,
time_elapsed_opening=9.6014ms, time_elapsed_processing=162.9916ms,
time_elapsed_scanning_total=235.7456ms,
time_elapsed_scanning_until_data=228.1358ms]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=11, elapsed_compute=161.908µs]
|
| | RepartitionExec:
partitioning=Hash([o_custkey@1], 12), input_partitions=12,
metrics=[fetch_time=278.4473ms, repartition_time=211.906µs, send_time=43.638µs]
|
| | DataSourceExec: file_groups={12 groups:
[[orders.parquet:0..105903266], [orders.parquet:105903266..211806532],
[orders.parquet:211806532..317709798], [orders.parquet:317709798..423613064],
[orders.parquet:423613064..529516330], ...]}, projection=[o_orderkey,
o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk,
o_shippriority, o_comment], file_type=parquet,
predicate=DynamicFilterPhysicalExpr [ o_custkey@1 >= 1 AND o_custkey@1 <= 1 ],
pruning_predicate=o_custkey_null_count@1 != row_count@2 AND o_custkey_max@0 >=
1 AND o_custkey_null_count@1 != row_count@2 AND o_custkey_min@3 <= 1,
required_guarantees=[], metrics=[output_rows=11, elapsed_compute=12ns,
batches_split=0, bytes_scanned=68872932, file_open_errors=0,
file_scan_errors=0, files_ranges_pruned_statistics=0,
num_predicate_creation_errors=0, page_index_rows_matched=0,
page_index_rows_pruned=0, predicate_evaluation_errors=0,
pushdown_rows_matched=11, pushdown_rows_pruned=36
86389, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=10,
row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=215,
bloom_filter_eval_time=231.613µs, metadata_load_time=35.709112ms,
page_index_eval_time=519ns, row_pushdown_eval_time=6.038918ms,
statistics_eval_time=361.613µs, time_elapsed_opening=38.8879ms,
time_elapsed_processing=224.2144ms, time_elapsed_scanning_total=239.723ms,
time_elapsed_scanning_until_data=131.8035ms]
|
| |
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
0.104s
output_rows from orders = 11 of 30M
```
This results in a >20x speed up. Here is an overview (the `pushdown_filters`
was enabled in 49.0.2 for consistency, but does not change the runtime
considerably):
```
┌────────────────────────────┬────────────────────┬─────────────────────────────────┐
│ │ Execution time (s) │ Filtered out rows from
"orders" │
├────────────────────────────┼────────────────────┼─────────────────────────────────┤
│ 49.0.2 │ 2.521 │ 0%
│
│ [pushdown_filters=true] │ │
│
├────────────────────────────┼────────────────────┼─────────────────────────────────┤
│ 50.0.0 │ 0.688 │ ~88%
│
│ [dynamic_filter_pushdown] │ │
│
├────────────────────────────┼────────────────────┼─────────────────────────────────┤
│ 50.0.0 │ 0.104 │ ~100%
│
│ [dynamic_filter_pushdown │ │
│
│ & pushdown_filters=true] │ │
│
└────────────────────────────┴────────────────────┴─────────────────────────────────┘
```
<img width="786" height="365" alt="image"
src="https://github.com/user-attachments/assets/e32ba7bb-4f5a-400f-bbb8-277d859a98a6"
/>
I also tried to show that using multiple filters would result in good
performance. However, I found some inconclusive results. For example, here is
the same query as the one above, but with a `c_phone = '25-989-741-2988' or
c_phone = '29-590-168-8634'` filter:
```sql
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=26, elapsed_compute=12.708µs]
|
| | HashJoinExec: mode=Partitioned, join_type=Inner,
on=[(c_custkey@0, o_custkey@1)], metrics=[output_rows=26,
elapsed_compute=5.897813ms, build_input_batches=2, build_input_rows=2,
input_batches=2, input_rows=26, output_batches=2, build_mem_used=1345,
build_time=3.0268ms, join_time=59.501µs]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=2, elapsed_compute=46.206µs]
|
| | RepartitionExec:
partitioning=Hash([c_custkey@0], 12), input_partitions=12,
metrics=[fetch_time=270.5539ms, repartition_time=30.21µs, send_time=9.042µs]
|
| | DataSourceExec: file_groups={12 groups:
[[customer.parquet:0..20627576], [customer.parquet:20627576..41255152],
[customer.parquet:41255152..61882728], [customer.parquet:61882728..82510304],
[customer.parquet:82510304..103137880], ...]}, projection=[c_custkey, c_name,
c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment],
file_type=parquet, predicate=c_phone@4 = 25-989-741-2988 OR c_phone@4 =
29-590-168-8634, pruning_predicate=c_phone_null_count@2 != row_count@3 AND
c_phone_min@0 <= 25-989-741-2988 AND 25-989-741-2988 <= c_phone_max@1 OR
c_phone_null_count@2 != row_count@3 AND c_phone_min@0 <= 29-590-168-8634 AND
29-590-168-8634 <= c_phone_max@1, required_guarantees=[c_phone in
(25-989-741-2988, 29-590-168-8634)], metrics=[output_rows=2,
elapsed_compute=12ns, batches_split=0, bytes_scanned=47855367,
file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=0,
num_predicate_creation_errors=0, page_index_rows_matched=0, page_i
ndex_rows_pruned=0, predicate_evaluation_errors=0, pushdown_rows_matched=2,
pushdown_rows_pruned=2999998, row_groups_matched_bloom_filter=0,
row_groups_matched_statistics=25, row_groups_pruned_bloom_filter=0,
row_groups_pruned_statistics=0, bloom_filter_eval_time=629.912µs,
metadata_load_time=2.153612ms, page_index_eval_time=519ns,
row_pushdown_eval_time=9.425812ms, statistics_eval_time=257.312µs,
time_elapsed_opening=6.6487ms, time_elapsed_processing=178.9948ms,
time_elapsed_scanning_total=263.8138ms,
time_elapsed_scanning_until_data=255.8887ms]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=26, elapsed_compute=606.205µs]
|
| | RepartitionExec:
partitioning=Hash([o_custkey@1], 12), input_partitions=12,
metrics=[fetch_time=952.4997ms, repartition_time=720.001µs,
send_time=177.128µs]
|
| | DataSourceExec: file_groups={12 groups:
[[orders.parquet:0..105903266], [orders.parquet:105903266..211806532],
[orders.parquet:211806532..317709798], [orders.parquet:317709798..423613064],
[orders.parquet:423613064..529516330], ...]}, projection=[o_orderkey,
o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk,
o_shippriority, o_comment], file_type=parquet,
predicate=DynamicFilterPhysicalExpr [ o_custkey@1 >= 2999998 AND o_custkey@1 <=
2999998 OR o_custkey@1 >= 1 AND o_custkey@1 <= 1 ],
pruning_predicate=o_custkey_null_count@1 != row_count@2 AND o_custkey_max@0 >=
2999998 AND o_custkey_null_count@1 != row_count@2 AND o_custkey_min@3 <=
2999998 OR o_custkey_null_count@1 != row_count@2 AND o_custkey_max@0 >= 1 AND
o_custkey_null_count@1 != row_count@2 AND o_custkey_min@3 <= 1,
required_guarantees=[], metrics=[output_rows=26, elapsed_compute=12ns,
batches_split=0, bytes_scanned=182217584, file_open_errors=0,
file_scan_errors=0, f
iles_ranges_pruned_statistics=0, num_predicate_creation_errors=0,
page_index_rows_matched=0, page_index_rows_pruned=0,
predicate_evaluation_errors=0, pushdown_rows_matched=26,
pushdown_rows_pruned=13516774, row_groups_matched_bloom_filter=0,
row_groups_matched_statistics=28, row_groups_pruned_bloom_filter=0,
row_groups_pruned_statistics=135, bloom_filter_eval_time=855.716µs,
metadata_load_time=21.491212ms, page_index_eval_time=916ns,
row_pushdown_eval_time=42.977712ms, statistics_eval_time=515.616µs,
time_elapsed_opening=27.1932ms, time_elapsed_processing=644.1071ms,
time_elapsed_scanning_total=926.2487ms,
time_elapsed_scanning_until_data=504.9686ms]
|
| |
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
0.166s
output_rows from orders = 26 of 30M
[ o_custkey@1 >= 2999998 AND o_custkey@1 <= 2999998 OR o_custkey@1 >= 1 AND
o_custkey@1 <= 1 ]
```
The dynamic filter is pretty efficient. However, consider the same query,
but using `c_phone = '25-989-741-2988' or c_phone = '29-699-674-3526'` as the
filter:
```sql
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=35, elapsed_compute=15.718506ms]
|
| | HashJoinExec: mode=Partitioned, join_type=Inner,
on=[(c_custkey@0, o_custkey@1)], metrics=[output_rows=35,
elapsed_compute=147.439515ms, build_input_batches=1, build_input_rows=2,
input_batches=3520, input_rows=30000000, output_batches=3810,
build_mem_used=1516, build_time=4.1056ms, join_time=139.447503ms]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=2, elapsed_compute=50.408µs]
|
| | RepartitionExec:
partitioning=Hash([c_custkey@0], 12), input_partitions=12,
metrics=[fetch_time=289.6309ms, repartition_time=40.51µs, send_time=10.542µs]
|
| | DataSourceExec: file_groups={12 groups:
[[customer.parquet:0..20627576], [customer.parquet:20627576..41255152],
[customer.parquet:41255152..61882728], [customer.parquet:61882728..82510304],
[customer.parquet:82510304..103137880], ...]}, projection=[c_custkey, c_name,
c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment],
file_type=parquet, predicate=c_phone@4 = 25-989-741-2988 OR c_phone@4 =
29-699-674-3526, pruning_predicate=c_phone_null_count@2 != row_count@3 AND
c_phone_min@0 <= 25-989-741-2988 AND 25-989-741-2988 <= c_phone_max@1 OR
c_phone_null_count@2 != row_count@3 AND c_phone_min@0 <= 29-699-674-3526 AND
29-699-674-3526 <= c_phone_max@1, required_guarantees=[c_phone in
(25-989-741-2988, 29-699-674-3526)], metrics=[output_rows=2,
elapsed_compute=12ns, batches_split=0, bytes_scanned=47855367,
file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=0,
num_predicate_creation_errors=0, page_index_rows_matched=0, page_i
ndex_rows_pruned=0, predicate_evaluation_errors=0, pushdown_rows_matched=2,
pushdown_rows_pruned=2999998, row_groups_matched_bloom_filter=0,
row_groups_matched_statistics=25, row_groups_pruned_bloom_filter=0,
row_groups_pruned_statistics=0, bloom_filter_eval_time=999.212µs,
metadata_load_time=16.355212ms, page_index_eval_time=817ns,
row_pushdown_eval_time=14.076612ms, statistics_eval_time=765.912µs,
time_elapsed_opening=28.8661ms, time_elapsed_processing=224.1269ms,
time_elapsed_scanning_total=260.453ms,
time_elapsed_scanning_until_data=252.3484ms]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=30000000, elapsed_compute=13.705637305s]
|
| | RepartitionExec:
partitioning=Hash([o_custkey@1], 12), input_partitions=12,
metrics=[fetch_time=11.5848169s, repartition_time=4.4595235s,
send_time=5.2565265s]
|
| | DataSourceExec: file_groups={12 groups:
[[orders.parquet:0..105903266], [orders.parquet:105903266..211806532],
[orders.parquet:211806532..317709798], [orders.parquet:317709798..423613064],
[orders.parquet:423613064..529516330], ...]}, projection=[o_orderkey,
o_custkey, o_orderstatus, o_totalprice, o_orderdate, o_orderpriority, o_clerk,
o_shippriority, o_comment], file_type=parquet,
predicate=DynamicFilterPhysicalExpr [ o_custkey@1 >= 1 AND o_custkey@1 <=
2999999 ], pruning_predicate=o_custkey_null_count@1 != row_count@2 AND
o_custkey_max@0 >= 1 AND o_custkey_null_count@1 != row_count@2 AND
o_custkey_min@3 <= 2999999, required_guarantees=[],
metrics=[output_rows=30000000, elapsed_compute=12ns, batches_split=0,
bytes_scanned=1269546359, file_open_errors=0, file_scan_errors=0,
files_ranges_pruned_statistics=0, num_predicate_creation_errors=0,
page_index_rows_matched=0, page_index_rows_pruned=0,
predicate_evaluation_errors=0, pushdown_rows_matched=3000000
0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0,
row_groups_matched_statistics=245, row_groups_pruned_bloom_filter=0,
row_groups_pruned_statistics=0, bloom_filter_eval_time=4.080812ms,
metadata_load_time=29.209612ms, page_index_eval_time=717ns,
row_pushdown_eval_time=88.763912ms, statistics_eval_time=367.312µs,
time_elapsed_opening=36.8624ms, time_elapsed_processing=9.1036123s,
time_elapsed_scanning_total=28.005503s,
time_elapsed_scanning_until_data=496.3794ms]
|
| |
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2.508s
output_rows from orders = 30M of 30M
[ o_custkey@1 >= 1 AND o_custkey@1 <= 2999999 ]
```
The dynamic filter now combines the partial filters, which ends up
retrieving the entire table. I think it might be worth looking at always
combining the partial filters with `OR`, unless maybe if the dynamic filter
becomes too large.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
