[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2199: Cleanup record skipping logic and tests (#2158)

GitBox Wed, 27 Jul 2022 08:52:54 -0700


tustvold commented on code in PR #2199:
URL: https://github.com/apache/arrow-rs/pull/2199#discussion_r931233002



##########
parquet/src/arrow/array_reader/mod.rs:
##########
@@ -145,29 +145,36 @@ where
     Ok(records_read)
 }
 
-/// Uses `pages` to set up to `record_reader` 's `column_reader`
+/// Uses `record_reader` to skip up to `batch_size` records from`pages`
 ///
-/// If we skip records before all read operation,
-/// need set `column_reader` by `set_page_reader`
-/// for constructing `def_level_decoder` and `rep_level_decoder`.
-fn set_column_reader<V, CV>(
+/// Returns the number of records skipped, which can be less than `batch_size` 
if
+/// pages is exhausted
+fn skip_records<V, CV>(

Review Comment:
   I found this method somewhat confusing, and I think it could potentially get 
stuck if the column reader was set but exhausted (i.e. on a column chunk 
boundary). #2198 will add some test coverage of this case, and this just copies 
the logic from read_records above, which is correct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2199: Cleanup record skipping logic and tests (#2158)

Reply via email to