tustvold commented on a change in pull request #1021:
URL: https://github.com/apache/arrow-rs/pull/1021#discussion_r768130610
##########
File path: parquet/src/arrow/record_reader.rs
##########
@@ -107,21 +103,25 @@ impl<T: DataType> RecordReader<T> {
loop {
// Try to find some records from buffers that has been read into
memory
// but not counted as seen records.
- records_read += self.split_records(num_records - records_read)?;
-
- // Since page reader contains complete records, so if we reached
end of a
- // page reader, we should reach the end of a record
- if end_of_column
- && self.values_seen >= self.values_written
- && self.in_middle_of_record
- {
- self.num_records += 1;
- self.num_values = self.values_seen;
- self.in_middle_of_record = false;
- records_read += 1;
+ let (record_count, value_count) =
+ self.count_records(num_records - records_read);
+
+ self.num_records += record_count;
+ self.num_values += value_count;
+ records_read += record_count;
+
+ if records_read == num_records {
+ break;
}
- if (records_read >= num_records) || end_of_column {
+ if end_of_column {
Review comment:
Ehehe, `PageReader` is actually a column chunk... So the end of a
`PageReader` is the end of a row group, not the end of a page. Confusingly
`PageIterator` is an iterator of `PageReader` which are themselves iterators of
`Page` :laughing:
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]