nuno-faria commented on PR #17197:
URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3195614261

   @adriangb It appears issue 2 is fixed, it has the correct filter now.
   
   Issue 1 still appears on my end, the number of rows collected from `t1` 
keeps changing:
   ```sql
   DataSourceExec t2.parquet,  output_rows=20480
   DataSourceExec t1.parquet, output_rows=7902848 -- should be 20480
   
   DataSourceExec t2.parquet,  output_rows=20480
   DataSourceExec t1.parquet, output_rows=10000000 -- should be 20480
   
   DataSourceExec t2.parquet,  output_rows=20480
   DataSourceExec t1.parquet, output_rows=9437184 -- should be 20480
   ```
   
   Note that `t2` is filtered correctly and returns 20480, but `t1` does not 
appear to be fully filtered by `predicate=DynamicFilterPhysicalExpr [ k@0 >= 1 
AND k@0 <= 1 ]`. This the result when I manually apply the filter:
   ```sql
   
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, 
metrics=[output_rows=1, elapsed_compute=22.8µs]                                 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                     |
   |                   |   FilterExec: k@0 = 1, metrics=[output_rows=1, 
elapsed_compute=542.011µs]                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                   |
   |                   |     DataSourceExec: file_groups={12 groups: 
[[t1.parquet:0..1066678], [t1.parquet:1066678..2133356], 
[t1.parquet:2133356..3200034], [t1.parquet:3200034..4266712], 
[t1.parquet:4266712..5333390], ...]}, projection=[k], file_type=parquet, 
predicate=k@0 = 1, pruning_predicate=k_null_count@2 != row_count@3 AND k_min@0 
<= 1 AND 1 <= k_max@1, required_guarantees=[k in (1)]                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                       |
   |                   | , metrics=[output_rows=20480, elapsed_compute=12ns, 
batches_splitted=0, bytes_scanned=206447, file_open_errors=0, 
file_scan_errors=0, files_ranges_pruned_statistics=0, 
num_predicate_creation_errors=0, page_index_rows_matched=20480, 
page_index_rows_pruned=1028096, predicate_evaluation_errors=0, 
pushdown_rows_matched=0, pushdown_rows_pruned=0, 
row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, 
row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=9, 
bloom_filter_eval_time=132.912µs, metadata_load_time=4.262112ms, 
page_index_eval_time=188.312µs, row_pushdown_eval_time=24ns, 
statistics_eval_time=1.258112ms, time_elapsed_opening=10.551ms, 
time_elapsed_processing=27.7574ms, time_elapsed_scanning_total=18.0588ms, 
time_elapsed_scanning_until_data=16.2569ms] |
   |                   |                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
           |
   
+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to