sunchao commented on a change in pull request #1021:
URL: https://github.com/apache/arrow-rs/pull/1021#discussion_r768133537
##########
File path: parquet/src/arrow/record_reader.rs
##########
@@ -107,21 +103,25 @@ impl<T: DataType> RecordReader<T> {
loop {
// Try to find some records from buffers that has been read into
memory
// but not counted as seen records.
- records_read += self.split_records(num_records - records_read)?;
-
- // Since page reader contains complete records, so if we reached
end of a
- // page reader, we should reach the end of a record
- if end_of_column
- && self.values_seen >= self.values_written
- && self.in_middle_of_record
- {
- self.num_records += 1;
- self.num_values = self.values_seen;
- self.in_middle_of_record = false;
- records_read += 1;
+ let (record_count, value_count) =
+ self.count_records(num_records - records_read);
+
+ self.num_records += record_count;
+ self.num_values += value_count;
+ records_read += record_count;
+
+ if records_read == num_records {
+ break;
}
- if (records_read >= num_records) || end_of_column {
+ if end_of_column {
Review comment:
Ah got it, thanks 🤦 . It all makes sense now!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]