Mark1626 opened a new issue, #18290:
URL: https://github.com/apache/datafusion/issues/18290

   I'm using datafusion `50.1.0` with dynamic filter pushdown enabled. The 
following query is a small section of TPC-DS q54. I noticed a large slow down, 
and noticed in explain analyze that `web_sales` was scanned completely. The 
plans and the explain analyze of the scan are 
[here](https://gist.github.com/Mark1626/c7e54ce3c3dd5d77dac825ee3f072ab7)
   
   ```sql
     select 
       i_item_id item_id, 
       sum(ws_ext_sales_price) ws_item_rev 
     from 
       web_sales, 
       item, 
       date_dim 
     where 
       ws_item_sk = i_item_sk 
       and d_date in (
         select 
           d_date 
         from 
           date_dim 
         where 
           d_week_seq =(
             select 
               d_week_seq 
             from 
               date_dim 
             where 
               d_date = '1998-02-19'
           )
       ) 
       and ws_sold_date_sk = d_date_sk 
     group by 
       i_item_id
   ```
   
   Dynamic filter
   
   ```
   predicate=DynamicFilterPhysicalExpr [ ws_item_sk@0 >= 1 AND ws_item_sk@0 <= 
54000 ] AND DynamicFilterPhysicalExpr [ ws_sold_date_sk@2 >= 2415022 AND 
ws_sold_date_sk@2 <= 2488070 ], 
   ```
   
   Explain Analyze
   ```
   metrics=[output_rows=3598645, elapsed_compute=14ns, batches_split=0, 
bytes_scanned=0, file_open_errors=0, file_scan_errors=0, 
files_ranges_pruned_statistics=0, num_predicate_creation_errors=0, 
page_index_rows_matched=0, page_index_rows_pruned=0, 
predicate_evaluation_errors=0, pushdown_rows_matched=7197290, 
pushdown_rows_pruned=858, row_groups_matched_bloom_filter=0, 
row_groups_matched_statistics=1800, row_groups_pruned_bloom_filter=0, 
row_groups_pruned_statistics=0, bloom_filter_eval_time=41.140467ms, 
metadata_load_time=356.618768ms, page_index_eval_time=205.72µs, 
row_pushdown_eval_time=284.704072ms, statistics_eval_time=40.716318ms, 
time_elapsed_opening=52.71329ms, time_elapsed_processing=2.034425262s, 
time_elapsed_scanning_total=9.825828938s, 
time_elapsed_scanning_until_data=6.676775892s]
   ```
   
   Same query simplified is much faster
   
   ```
   select 
     i_item_id item_id, 
     sum(ws_ext_sales_price) ws_item_rev 
   from 
     web_sales, 
     item, 
     date_dim 
   where 
     ws_item_sk = i_item_sk 
     and d_date in (
       '1998-02-17', '1998-02-18', '1998-02-19', '1998-02-20', '1998-02-21', 
'1998-02-22', '1998-02-23'
     ) 
     and ws_sold_date_sk = d_date_sk 
   group by 
     i_item_id;
   ```
   
   Dynamic filter 
   
   A much smaller range is used now
   
   ```
   predicate=DynamicFilterPhysicalExpr [ ws_item_sk@0 >= 1 AND ws_item_sk@0 <= 
54000 ] AND DynamicFilterPhysicalExpr [ ws_sold_date_sk@2 >= 2450862 AND 
ws_sold_date_sk@2 <= 2450868 ], 
   ```
   
   Explain Analyze
   
   Shows a lot less output rows as more rows pruned
   
   ```
   metrics=[output_rows=7880, elapsed_compute=14ns, batches_split=0, 
bytes_scanned=0, file_open_errors=0, file_scan_errors=0, 
files_ranges_pruned_statistics=1829, num_predicate_creation_errors=0, 
page_index_rows_matched=0, page_index_rows_pruned=0, 
predicate_evaluation_errors=0, pushdown_rows_matched=15760, 
pushdown_rows_pruned=858, row_groups_matched_bloom_filter=0, 
row_groups_matched_statistics=2, row_groups_pruned_bloom_filter=0, 
row_groups_pruned_statistics=0, bloom_filter_eval_time=44.38µs, 
metadata_load_time=931.791µs, page_index_eval_time=4.667µs, 
row_pushdown_eval_time=868.499µs, statistics_eval_time=59.672µs, 
time_elapsed_opening=4.838903ms, time_elapsed_processing=292.590292ms, 
time_elapsed_scanning_total=220.44782ms, 
time_elapsed_scanning_until_data=216.799489ms]
   ```
   
   I strongly suspect the plan and the join order as in the simplified version 
a lot more files ranges are pruned
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to