tustvold commented on code in PR #2199: URL: https://github.com/apache/arrow-rs/pull/2199#discussion_r931233002
########## parquet/src/arrow/array_reader/mod.rs: ########## @@ -145,29 +145,36 @@ where Ok(records_read) } -/// Uses `pages` to set up to `record_reader` 's `column_reader` +/// Uses `record_reader` to skip up to `batch_size` records from`pages` /// -/// If we skip records before all read operation, -/// need set `column_reader` by `set_page_reader` -/// for constructing `def_level_decoder` and `rep_level_decoder`. -fn set_column_reader<V, CV>( +/// Returns the number of records skipped, which can be less than `batch_size` if +/// pages is exhausted +fn skip_records<V, CV>( Review Comment: I found this method somewhat confusing, and I think it could potentially get stuck if the column reader was set but exhausted (i.e. on a column chunk boundary). #2198 will add some test coverage of this case, and this just copies the logic from read_records above, which is correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org