zhuqi-lucas commented on PR #10158:
URL: https://github.com/apache/arrow-rs/pull/10158#issuecomment-4775559633

   Addressed in 74435caef (force-push pending if you want to squash). Summary 
of what changed and how each comment was handled:
   
   **alamb's doc comments** (`mod.rs:541-561`) — applied via GitHub 
suggestions, then I tightened the rewording slightly to call out the new 
mid-row-group semantics. Public doc now reads "Safe to call at any time" and 
explicitly states what mid-row-group means.
   
   **alamb @ `mod.rs:868`** (link inner helper to public doc) — done.
   
   **alamb @ `mod.rs:876`** (return next RG in `DecodingRowGroup` state) — 
done. The `DecodingRowGroup` arm now delegates to 
`remaining_row_groups.peek_next_row_group()` and returns the row group 
`try_next_reader` will produce *after* the active one. New test 
`test_peek_next_row_group_during_decoding_row_group` covers this — drives 
`try_decode` mid-RG with a small batch size and asserts peek reports the next 
RG, not `None`.
   
   **alamb @ `remaining.rs:101`** (drift concern with 
`next_readable_row_group`) — addressed by extracting `classify_row_group` 
(predicate-and-budget Read/Skip rule) as a private helper that both walkers 
call. The duplicated branch is gone; new test 
`test_peek_next_row_group_does_not_mutate_state` calls peek repeatedly between 
reads and asserts each pair agrees, locking in the lock-step contract.
   
   **alamb @ `remaining.rs:121`** (avoid the selection clone) — replaced with a 
borrowed `PeekSelectionCursor` that walks `RowSelection::iter` with a cursor 
and counts selected rows per row group without disturbing the real selection. 
No more `selection.clone()` in peek.
   
   **adriangb @ `mod.rs:565`** (rename to `peek_remaining_row_groups`?) — 
keeping the single-value `peek_next_row_group` for now. The only consumer I 
have (DataFusion runtime RG pruning follow-up) needs exactly the next index, 
and you said *"maybe it's worth doing the simpler API for now and we can always 
add another one later"* — taking you up on that. Happy to add a separate 
`peek_remaining_row_groups` returning `&[usize]` in a follow-up when there's a 
concrete second use case.
   
   Existing 6 peek tests + 2 new tests all pass (46 push_decoder tests, 1223 
parquet tests, clippy clean).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to