HippoBaro opened a new pull request, #9804:
URL: https://github.com/apache/arrow-rs/pull/9804
# Which issue does this PR close?
- Prerequisite to #9697
# Rationale for this change
#9697 aims to make staged buffer management in the push decoder more
explicit. In doing so, it exposes a structural problem: the logic for deciding
whether a row group is still live, skipped, or unreachable is spread across
several parts of the decoder.
This matters because row-group-level buffer release depends on a single
question having a clear answer: can this row group ever need bytes again? That
answer depends on the queued row groups, the remaining selection, the running
offset/limit budget, and whether predicates require the decoder to stay
conservative. Today, that state is split across multiple components, which
makes the release policy difficult to centralize cleanly.
# What changes are included in this PR?
This PR introduces a clearer ownership boundary in the push decoder:
- cross-row-group scan state is now handled by a dedicated
frontier/look-ahead mechanism
- the row-group builder is reduced to current-row-group decode work only
- offset/limit accounting and row-group selection advancement are
centralized around that frontier/builder split
This does not implement row-group-level buffer release directly, but it
establishes the structure needed for that follow-up work. It should also make
future pruning rules easier to add and maintain.
# Are these changes tested?
All existing tests pass, and the refactor adds focused coverage for the
extracted budget logic and the frontier-driven `try_next_reader` path.
# Are there any user-facing changes?
None.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]