zhuqi-lucas commented on PR #10158: URL: https://github.com/apache/arrow-rs/pull/10158#issuecomment-4775559633
Addressed in 74435caef (force-push pending if you want to squash). Summary of what changed and how each comment was handled: **alamb's doc comments** (`mod.rs:541-561`) — applied via GitHub suggestions, then I tightened the rewording slightly to call out the new mid-row-group semantics. Public doc now reads "Safe to call at any time" and explicitly states what mid-row-group means. **alamb @ `mod.rs:868`** (link inner helper to public doc) — done. **alamb @ `mod.rs:876`** (return next RG in `DecodingRowGroup` state) — done. The `DecodingRowGroup` arm now delegates to `remaining_row_groups.peek_next_row_group()` and returns the row group `try_next_reader` will produce *after* the active one. New test `test_peek_next_row_group_during_decoding_row_group` covers this — drives `try_decode` mid-RG with a small batch size and asserts peek reports the next RG, not `None`. **alamb @ `remaining.rs:101`** (drift concern with `next_readable_row_group`) — addressed by extracting `classify_row_group` (predicate-and-budget Read/Skip rule) as a private helper that both walkers call. The duplicated branch is gone; new test `test_peek_next_row_group_does_not_mutate_state` calls peek repeatedly between reads and asserts each pair agrees, locking in the lock-step contract. **alamb @ `remaining.rs:121`** (avoid the selection clone) — replaced with a borrowed `PeekSelectionCursor` that walks `RowSelection::iter` with a cursor and counts selected rows per row group without disturbing the real selection. No more `selection.clone()` in peek. **adriangb @ `mod.rs:565`** (rename to `peek_remaining_row_groups`?) — keeping the single-value `peek_next_row_group` for now. The only consumer I have (DataFusion runtime RG pruning follow-up) needs exactly the next index, and you said *"maybe it's worth doing the simpler API for now and we can always add another one later"* — taking you up on that. Happy to add a separate `peek_remaining_row_groups` returning `&[usize]` in a follow-up when there's a concrete second use case. Existing 6 peek tests + 2 new tests all pass (46 push_decoder tests, 1223 parquet tests, clippy clean). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
