nuno-faria commented on PR #17275: URL: https://github.com/apache/datafusion/pull/17275#issuecomment-3274964508
> @nuno-faria -- I added a config flag > > Can you possibly test that if you set > > ```sql > set datafusion.execution.parquet.max_predicate_cache_size = 0 > ``` > > That the I/O goes back to what you it was like in 56.0.0? Thanks. I tried with the latest commit but still see the same behavior. ``` ❯ git log -1 commit 7a6ea93e7b995c216131cc304c34d55c7a2ed528 (HEAD -> alamb/update_arrow) Author: Andrew Lamb <and...@nerdnetworks.org> Date: Tue Sep 9 14:24:56 2025 -0400 Thread through max_predicate_cache_size, add test ``` Here is a `datafusion-cli` test: ```sql DataFusion CLI v50.0.0 > set datafusion.execution.parquet.pushdown_filters = true; 0 row(s) fetched. Elapsed 0.003 seconds. > set datafusion.execution.parquet.max_predicate_cache_size = 0; 0 row(s) fetched. Elapsed 0.001 seconds. > copy ( select i as k from generate_series(1, 1000000) as t(i) order by k ) to 't.parquet' options (MAX_ROW_GROUP_SIZE 100000, DATA_PAGE_ROW_COUNT_LIMIT 1000, WRITE_BATCH_SIZE 1000, DICTIONARY_ENABLED FALSE); +---------+ | count | +---------+ | 1000000 | +---------+ 1 row(s) fetched. Elapsed 0.861 seconds. > create external table t stored as parquet location 't.parquet'; 0 row(s) fetched. Elapsed 0.007 seconds. > explain analyze select k from t where k = 123456; total=9929 ranges=[125400..126482, 126482..127564, 127564..128646, 128646..129728, 129728..130810, 130810..131892, 131892..132974, 132974..134247, 134247..135329] total=0 ranges=[] +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------+ | Plan with Metrics | DataSourceExec: file_groups={1 group: [[/t.parquet]]}, projection=[k], file_type=parquet, predicate=k@0 = 123456, pruning_predicate=k_null_count@2 != row_count@3 AND k_min@0 <= 123456 AND 123456 <= k_max@1, required_guarantees=[k in (123456)], metrics=[output_rows=1, elapsed_compute=1ns, batches_split=0, bytes_scanned=9929, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=0, num_predicate_creation_errors=0, page_index_rows_matched=1192, page_index_rows_pruned=98808, predicate_cache_inner_records=16384, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=1, pushdown_rows_pruned=1191, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=9, bloom_filter_eval_time=195.801µs, metadata_load_time=340.301µs, page_index_eval_time=233.801µs, row_pushdown_eval_time=57.201µs, statistics_eval_time=387.401µs, time_elapsed_opening=2.1128ms, ti me_elapsed_processing=6.9853ms, time_elapsed_scanning_total=5.324ms, time_elapsed_scanning_until_data=5.2613ms] | | | | +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.016 seconds. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org