sahuagin commented on PR #9787: URL: https://github.com/apache/arrow-rs/pull/9787#issuecomment-4308090947
Thanks @etseidl — the panic test is conclusive. I traced `skip_records` → `skip_values` → `Decoder::skip` and confirmed the only path to a terminal skip at the decoder level is via a nested schema whose page metadata lacks `num_rows`. `GenericColumnReader::skip_records` (parquet/src/column/reader.rs:311-317) handles full-page skips via `page_reader.skip_next_page()`, which bypasses the decoder entirely, so the terminal condition `to_skip >= self.values_left` is never produced by standard callers. `scan_filtered` in #9788 doesn't rescue this either — it has its own miniblock loop and doesn't call `Decoder::skip`. The structural refactor from the earlier round of review (moving the fast-path before `first_value.take()` and just setting `values_left = 0`) is in place and correct; the optimization itself is simply unreachable from current callers. Rather than construct a synthetic benchmark for the narrow nested-missing-metadata corner, I'll withdraw this PR and leave the branch `pr/terminal-skip` available if a caller materializes in the future. #9786 and #9788 cover the work that is reachable today. Closing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
