Mark1626 opened a new issue, #18290: URL: https://github.com/apache/datafusion/issues/18290
I'm using datafusion `50.1.0` with dynamic filter pushdown enabled. The following query is a small section of TPC-DS q54. I noticed a large slow down, and noticed in explain analyze that `web_sales` was scanned completely. The plans and the explain analyze of the scan are [here](https://gist.github.com/Mark1626/c7e54ce3c3dd5d77dac825ee3f072ab7) ```sql select i_item_id item_id, sum(ws_ext_sales_price) ws_item_rev from web_sales, item, date_dim where ws_item_sk = i_item_sk and d_date in ( select d_date from date_dim where d_week_seq =( select d_week_seq from date_dim where d_date = '1998-02-19' ) ) and ws_sold_date_sk = d_date_sk group by i_item_id ``` Dynamic filter ``` predicate=DynamicFilterPhysicalExpr [ ws_item_sk@0 >= 1 AND ws_item_sk@0 <= 54000 ] AND DynamicFilterPhysicalExpr [ ws_sold_date_sk@2 >= 2415022 AND ws_sold_date_sk@2 <= 2488070 ], ``` Explain Analyze ``` metrics=[output_rows=3598645, elapsed_compute=14ns, batches_split=0, bytes_scanned=0, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=0, num_predicate_creation_errors=0, page_index_rows_matched=0, page_index_rows_pruned=0, predicate_evaluation_errors=0, pushdown_rows_matched=7197290, pushdown_rows_pruned=858, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1800, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=41.140467ms, metadata_load_time=356.618768ms, page_index_eval_time=205.72µs, row_pushdown_eval_time=284.704072ms, statistics_eval_time=40.716318ms, time_elapsed_opening=52.71329ms, time_elapsed_processing=2.034425262s, time_elapsed_scanning_total=9.825828938s, time_elapsed_scanning_until_data=6.676775892s] ``` Same query simplified is much faster ``` select i_item_id item_id, sum(ws_ext_sales_price) ws_item_rev from web_sales, item, date_dim where ws_item_sk = i_item_sk and d_date in ( '1998-02-17', '1998-02-18', '1998-02-19', '1998-02-20', '1998-02-21', '1998-02-22', '1998-02-23' ) and ws_sold_date_sk = d_date_sk group by i_item_id; ``` Dynamic filter A much smaller range is used now ``` predicate=DynamicFilterPhysicalExpr [ ws_item_sk@0 >= 1 AND ws_item_sk@0 <= 54000 ] AND DynamicFilterPhysicalExpr [ ws_sold_date_sk@2 >= 2450862 AND ws_sold_date_sk@2 <= 2450868 ], ``` Explain Analyze Shows a lot less output rows as more rows pruned ``` metrics=[output_rows=7880, elapsed_compute=14ns, batches_split=0, bytes_scanned=0, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=1829, num_predicate_creation_errors=0, page_index_rows_matched=0, page_index_rows_pruned=0, predicate_evaluation_errors=0, pushdown_rows_matched=15760, pushdown_rows_pruned=858, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=2, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=44.38µs, metadata_load_time=931.791µs, page_index_eval_time=4.667µs, row_pushdown_eval_time=868.499µs, statistics_eval_time=59.672µs, time_elapsed_opening=4.838903ms, time_elapsed_processing=292.590292ms, time_elapsed_scanning_total=220.44782ms, time_elapsed_scanning_until_data=216.799489ms] ``` I strongly suspect the plan and the join order as in the simplified version a lot more files ranges are pruned -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
