nuno-faria commented on PR #17197: URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3195614261
@adriangb It appears issue 2 is fixed, it has the correct filter now. Issue 1 still appears on my end, the number of rows collected from `t1` keeps changing: ```sql DataSourceExec t2.parquet, output_rows=20480 DataSourceExec t1.parquet, output_rows=7902848 -- should be 20480 DataSourceExec t2.parquet, output_rows=20480 DataSourceExec t1.parquet, output_rows=10000000 -- should be 20480 DataSourceExec t2.parquet, output_rows=20480 DataSourceExec t1.parquet, output_rows=9437184 -- should be 20480 ``` Note that `t2` is filtered correctly and returns 20480, but `t1` does not appear to be fully filtered by `predicate=DynamicFilterPhysicalExpr [ k@0 >= 1 AND k@0 <= 1 ]`. This the result when I manually apply the filter: ```sql +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=1, elapsed_compute=22.8µs] | | | FilterExec: k@0 = 1, metrics=[output_rows=1, elapsed_compute=542.011µs] | | | DataSourceExec: file_groups={12 groups: [[t1.parquet:0..1066678], [t1.parquet:1066678..2133356], [t1.parquet:2133356..3200034], [t1.parquet:3200034..4266712], [t1.parquet:4266712..5333390], ...]}, projection=[k], file_type=parquet, predicate=k@0 = 1, pruning_predicate=k_null_count@2 != row_count@3 AND k_min@0 <= 1 AND 1 <= k_max@1, required_guarantees=[k in (1)] | | | , metrics=[output_rows=20480, elapsed_compute=12ns, batches_splitted=0, bytes_scanned=206447, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=0, num_predicate_creation_errors=0, page_index_rows_matched=20480, page_index_rows_pruned=1028096, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=9, bloom_filter_eval_time=132.912µs, metadata_load_time=4.262112ms, page_index_eval_time=188.312µs, row_pushdown_eval_time=24ns, statistics_eval_time=1.258112ms, time_elapsed_opening=10.551ms, time_elapsed_processing=27.7574ms, time_elapsed_scanning_total=18.0588ms, time_elapsed_scanning_until_data=16.2569ms] | | | | +-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org