sushiljacksparrow opened a new pull request, #597:
URL: https://github.com/apache/hudi-rs/pull/597

   ## Change Logs
   
   Implements the re-scoped scope from #205 (xushiyan's 2026-04-28 comment): 
parallelize `get_leaf_dirs` and add an optional per-level predicate so 
partition pruning short-circuits subtrees during descent.
   
   1. `get_leaf_dirs` (`crates/core/src/storage/mod.rs`) gains an optional 
predicate and a parallelism bound. Subtrees the predicate rejects are skipped 
without listing; sibling subdirs at each level are walked concurrently via 
`buffer_unordered`. Parallelism is bounded by `hoodie.plan.listing.parallelism` 
(default 10).
   2. `PartitionPruner::should_include_prefix` evaluates filters whose field is 
already present in a partial path. Filters for deeper fields are deferred. The 
single `_hoodie_partition_path` field used by timestamp-based key generators 
conservatively admits at intermediate levels — `should_include` runs at the 
leaf as before.
   3. `FileLister::list_relevant_partition_paths` plumbs both: the predicate 
filters lake-format metadata dirs (`.hoodie`, `_delta_log`, `metadata`) and 
calls `should_include_prefix` for partition pruning.
   
   Scope is the no-MDT listing path only. The metadata-table FILES path is 
unchanged.
   
   Closes #205.
   
   ## Impact
   
   Performance. On remote object stores with selective partition filters, every 
`LIST` for a pruned subtree is eliminated. Even un-filtered listings benefit 
from parallel descent.
   
   No behavior change to the listed set on either path.
   
   ## Risk level
   
   low
   
   ## Documentation Update
   
   None required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to