zhuqi-lucas commented on PR #9937:
URL: https://github.com/apache/arrow-rs/pull/9937#issuecomment-4415297373

   Thanks for the review @alamb — your concern about maintaining two 
near-identical reader paths was exactly right. I dug into the diff and ~95% of 
`ReverseSerializedPageReader` was just `SerializedPageReader::Pages` mode with 
`pop_back` instead of `pop_front`. Pushed `4c8b46e` collapsing the two:
   
   **API now:**
   
   ```rust
   let reader = SerializedPageReader::new(
       chunk_reader, meta, total_rows, Some(page_locations),
   )?
   .with_reverse_pages(true);   // ← new builder flag
   ```
   
   **What changed under the hood:**
   
   - `SerializedPageReaderState::Pages` gains a `reverse: bool` field
   - `with_reverse_pages(true)` flips the flag and seeds `page_index` to `len - 
1` so the page ordinal stays correct (this matters for encryption AAD — reverse 
mode emits the back-most page first, which has ordinal `N - 1` in the original 
forward order)
   - `get_next_page` / `peek_next_page` / `skip_next_page` / 
`peek_next_page_offset` branch `pop_front` ↔ `pop_back`, `front` ↔ `back`, and 
`+= 1` ↔ `saturating_sub(1)` under the flag
   - Dictionary handling is unchanged — it's still emitted first in both modes 
(data pages depend on the loaded dict)
   
   **The dedicated `reverse_serialized_reader.rs` module is gone** (~1080 lines 
deleted). Net change: **−190 lines source, single code path**. Encryption now 
works for reverse iteration too, which it didn't in the standalone reader.
   
   **On the dict-detection wrinkle I mentioned**: I considered the "pass a 
pre-reversed `Vec<PageLocation>`" route but it breaks the existing dict 
inference (which uses the gap between `chunk_start` and 
`page_locations[0].offset`). The flag-based approach keeps the existing builder 
logic untouched and only flips iteration direction at the per-call site, so I 
went that way. Happy to reconsider if you'd prefer the alternative.
   
   **Tests:** all 21 reverse-correctness tests are migrated to a 
`reverse_pages_tests` sub-module in `serialized_reader.rs`. Added one new test 
(`with_reverse_pages_is_noop_in_values_mode`) verifying the flag is harmless in 
`Values` mode. The bench is updated to call the new API; numbers are within 
noise of the pre-refactor result (~50x speedup for `n ≤ one page`, scaling down 
with `n` as expected).
   
   PR description has been updated to match.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to