[PR] feat(scan): add bucket pruning, DV/postpone filtering, and DE group pruning [paimon-rust]

via GitHub Sat, 04 Apr 2026 06:57:49 -0700


JingsongLi opened a new pull request, #205:
URL: https://github.com/apache/paimon-rust/pull/205


   
   
   <!--
   Thank you very much for contributing to Paimon Rust - we are happy that you 
want to help us improve it. To help the community review your contribution in 
the best possible way, please go through the checklist below, which will get 
the contribution into a shape in which it can be best reviewed.
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GitHub 
issue](https://github.com/apache/paimon-rust/issues). Exceptions are made for 
typos in documentation or comments, which need no issue.
   
     - Fill out the template below to describe the changes contributed by the 
pull request. That will give reviewers the context they need to do the review.
   
     - Make sure that the change passes the automated tests, i.e., `cargo test` 
passes.
   
     - Each pull request should address only one issue, not mix up code from 
multiple issues.
   
   **(The sections below can be removed for hotfixes or typos)**
   -->
   
   ### Purpose
   
   <!-- Linking this pull request to the issue -->
   Subtask of #173
   
   Major scan optimizations for TableScan:
   
   - Push partition/data predicate filters into read_all_manifest_entries for 
early concurrent pruning during manifest reading
   - Add manifest-file-level partition stats pruning before reading manifest 
files
   - DV level-0 filtering now only applies to primary-key tables; non-PK tables 
with DV keep level-0 files
   - Filter out postpone bucket (bucket < 0) entries for PK tables
   - Data evolution group-level predicate filtering: after 
group_by_overlapping_row_id, merged stats across overlapping files allow 
pruning entire groups
   - Bucket predicate filtering: extract bucket key predicates (Eq/In), compute 
target buckets via MurmurHash3 (seed=42, word-aligned), and skip manifest 
entries whose bucket is not in the target set. Supports composite bucket keys 
and defaults to primary keys for PK tables.
   
   New files:
   - murmur_hash.rs: Paimon-compatible MurmurHash3 32-bit implementation with 
compute_bucket_from_datums for BinaryRow construction from typed Datum values
   
   New CoreOptions: bucket_key(), bucket()
   New utility: extract_predicate_for_keys() generic predicate projection
   
   Tests: unit tests for bucket computation, predicate extraction, and target 
bucket calculation; integration tests for PK-without-DV, non-PK-with-DV, 
postpone bucket, data evolution filtering, and bucket predicate filtering.
   
   
   <!-- What is the purpose of the change -->
   
   ### Brief change log
   
   <!-- Please describe the changes made in this pull request and explain how 
they address the issue -->
   
   ### Tests
   
   <!-- List unit tests or integration cases to verify this change -->
   
   ### API and Format
   
   <!-- Does this change affect API or storage format -->
   
   ### Documentation
   
   <!-- Does this change introduce a new feature or require documentation 
updates -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(scan): add bucket pruning, DV/postpone filtering, and DE group pruning [paimon-rust]

Reply via email to