sunchao commented on a change in pull request #1031:
URL: https://github.com/apache/arrow-rs/pull/1031#discussion_r768925684
##########
File path: parquet/src/arrow/array_reader.rs
##########
@@ -100,6 +100,36 @@ pub trait ArrayReader {
fn get_rep_levels(&self) -> Option<&[i16]>;
}
+/// Uses `record_reader` to read up to `batch_size` records from `pages`
+///
+/// Returns the number of records read, which can be less than batch_size if
+/// pages is exhausted.
+fn read_records<T: DataType>(
+ record_reader: &mut RecordReader<T>,
+ pages: &mut dyn PageIterator,
+ batch_size: usize,
+) -> Result<usize> {
+ let mut records_read = 0usize;
+ while records_read < batch_size {
+ let records_to_read = batch_size - records_read;
+
+ let records_read_once = record_reader.read_records(records_to_read)?;
+ records_read += records_read_once;
+
+ // Record reader exhausted
+ if records_read_once < records_to_read {
+ if let Some(page_reader) = pages.next() {
+ // Read from new page reader (i.e. column chunk)
+ record_reader.set_page_reader(page_reader?)?;
Review comment:
Got it, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]