hhhizzz opened a new issue, #385: URL: https://github.com/apache/paimon-rust/issues/385
### Search before asking - [x] I searched in the [issues](https://github.com/apache/paimon-rust/issues) and found nothing similar. ### Motivation ## Motivation Paimon Rust scan planning has several metadata pruning paths, including partition pruning, bucket pruning, file min/max stats pruning, LIMIT split reduction, COUNT(*) statistics rewrite, and time-travel snapshot selection. Today these pruning decisions are hard to inspect from tests or physical plan output. This makes it difficult to identify cases that still do full scans or produce too many splits. There is also a specific gap for non-partition `IN` predicates: file min/max stats can prove that some files cannot match, but `IN` currently fails open and keeps those files. ## Proposal Add lightweight scan planning trace counters so tests and DataFusion physical plan display can show how many manifests, manifest entries, splits, and files survive each pruning stage. Use the trace to add self-contained pruning baselines for: - partition pruning - bucket-key pruning - SQL BETWEEN partition pruning - LIMIT split reduction - COUNT(*) statistics rewrite - time-travel snapshot selection Separately, improve non-partition `IN` stats pruning by checking whether any `IN` literal overlaps the file min/max range. Keep conservative behavior for `NOT IN`, missing stats, corrupt stats, and unsupported comparisons. ### Solution - #381 adds scan pruning trace counters and baseline tests. - #382 fixes non-partition `IN` stats pruning with file min/max stats. ### Anything else? _No response_ ### Willingness to contribute - [x] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
