nuno-faria commented on PR #17275:
URL: https://github.com/apache/datafusion/pull/17275#issuecomment-3274964508

   > @nuno-faria -- I added a config flag
   > 
   > Can you possibly test that if you set
   > 
   > ```sql
   > set datafusion.execution.parquet.max_predicate_cache_size = 0
   > ```
   > 
   > That the I/O goes back to what you it was like in 56.0.0?
   
   Thanks. I tried with the latest commit but still see the same behavior.
   
   ```
   ❯ git log -1
   commit 7a6ea93e7b995c216131cc304c34d55c7a2ed528 (HEAD -> alamb/update_arrow)
   Author: Andrew Lamb <and...@nerdnetworks.org>
   Date:   Tue Sep 9 14:24:56 2025 -0400
   
       Thread through max_predicate_cache_size, add test
   ```
   
   Here is a `datafusion-cli` test:
   
   ```sql
   DataFusion CLI v50.0.0
   > set datafusion.execution.parquet.pushdown_filters = true;
   0 row(s) fetched.
   Elapsed 0.003 seconds.
   
   > set datafusion.execution.parquet.max_predicate_cache_size = 0;
   0 row(s) fetched.
   Elapsed 0.001 seconds.
   
   > copy (
               select i as k
               from generate_series(1, 1000000) as t(i)
               order by k
           ) to 't.parquet'
           options (MAX_ROW_GROUP_SIZE 100000, DATA_PAGE_ROW_COUNT_LIMIT 1000, 
WRITE_BATCH_SIZE 1000, DICTIONARY_ENABLED FALSE);
   +---------+
   | count   |
   +---------+
   | 1000000 |
   +---------+
   1 row(s) fetched.
   Elapsed 0.861 seconds.
   
   > create external table t stored as parquet location 't.parquet';
   0 row(s) fetched.
   Elapsed 0.007 seconds.
   
   > explain analyze select k from t where k = 123456;
   total=9929
   ranges=[125400..126482, 126482..127564, 127564..128646, 128646..129728, 
129728..130810, 130810..131892, 131892..132974, 132974..134247, 134247..135329]
   total=0
   ranges=[]
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-----------------------------------------------------------------------------------------------------------+
   | plan_type         | plan                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                            |
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-----------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | DataSourceExec: file_groups={1 group: [[/t.parquet]]}, 
projection=[k], file_type=parquet, predicate=k@0 = 123456, 
pruning_predicate=k_null_count@2 != row_count@3 AND k_min@0 <= 123456 AND 
123456 <= k_max@1, required_guarantees=[k in (123456)], metrics=[output_rows=1, 
elapsed_compute=1ns, batches_split=0, bytes_scanned=9929, file_open_errors=0, 
file_scan_errors=0, files_ranges_pruned_statistics=0, 
num_predicate_creation_errors=0, page_index_rows_matched=1192, 
page_index_rows_pruned=98808, predicate_cache_inner_records=16384, 
predicate_cache_records=0, predicate_evaluation_errors=0, 
pushdown_rows_matched=1, pushdown_rows_pruned=1191, 
row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, 
row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=9, 
bloom_filter_eval_time=195.801µs, metadata_load_time=340.301µs, 
page_index_eval_time=233.801µs, row_pushdown_eval_time=57.201µs, 
statistics_eval_time=387.401µs, time_elapsed_opening=2.1128ms, ti
 me_elapsed_processing=6.9853ms, time_elapsed_scanning_total=5.324ms, 
time_elapsed_scanning_until_data=5.2613ms] |
   |                   |                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                            |
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-----------------------------------------------------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.016 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to