[I] Improve scan pruning observability and IN predicate stats pruning [paimon-rust]

via GitHub Sat, 13 Jun 2026 06:57:42 -0700


hhhizzz opened a new issue, #385:
URL: https://github.com/apache/paimon-rust/issues/385


   ### Search before asking
   
   - [x] I searched in the 
[issues](https://github.com/apache/paimon-rust/issues) and found nothing 
similar.
   
   
   ### Motivation
   
   ## Motivation
   
   Paimon Rust scan planning has several metadata pruning paths, including 
partition pruning, bucket pruning, file min/max stats pruning, LIMIT split 
reduction, COUNT(*) statistics rewrite, and time-travel snapshot selection.
   
   Today these pruning decisions are hard to inspect from tests or physical 
plan output. This makes it difficult to identify cases that still do full scans 
or produce too many splits.
   
   There is also a specific gap for non-partition `IN` predicates: file min/max 
stats can prove that some files cannot match, but `IN` currently fails open and 
keeps those files.
   
   ## Proposal
   
   Add lightweight scan planning trace counters so tests and DataFusion 
physical plan display can show how many manifests, manifest entries, splits, 
and files survive each pruning stage.
   
   Use the trace to add self-contained pruning baselines for:
   
   - partition pruning
   - bucket-key pruning
   - SQL BETWEEN partition pruning
   - LIMIT split reduction
   - COUNT(*) statistics rewrite
   - time-travel snapshot selection
   
   Separately, improve non-partition `IN` stats pruning by checking whether any 
`IN` literal overlaps the file min/max range. Keep conservative behavior for 
`NOT IN`, missing stats, corrupt stats, and unsupported comparisons.
   
   
   ### Solution
   
   - #381 adds scan pruning trace counters and baseline tests.
   - #382 fixes non-partition `IN` stats pruning with file min/max stats.
   
   ### Anything else?
   
   _No response_
   
   ### Willingness to contribute
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Improve scan pruning observability and IN predicate stats pruning [paimon-rust]

Reply via email to