sahuagin commented on PR #9787:
URL: https://github.com/apache/arrow-rs/pull/9787#issuecomment-4308090947

   Thanks @etseidl — the panic test is conclusive.
   
   I traced `skip_records` → `skip_values` → `Decoder::skip` and confirmed the 
only path to a terminal skip at the decoder level is via a nested schema whose 
page metadata lacks `num_rows`. `GenericColumnReader::skip_records` 
(parquet/src/column/reader.rs:311-317) handles full-page skips via 
`page_reader.skip_next_page()`, which bypasses the decoder entirely, so the 
terminal condition `to_skip >= self.values_left` is never produced by standard 
callers.
   
   `scan_filtered` in #9788 doesn't rescue this either — it has its own 
miniblock loop and doesn't call `Decoder::skip`.
   
   The structural refactor from the earlier round of review (moving the 
fast-path before `first_value.take()` and just setting `values_left = 0`) is in 
place and correct; the optimization itself is simply unreachable from current 
callers. Rather than construct a synthetic benchmark for the narrow 
nested-missing-metadata corner, I'll withdraw this PR and leave the branch 
`pr/terminal-skip` available if a caller materializes in the future. #9786 and 
#9788 cover the work that is reachable today.
   
   Closing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to