tustvold commented on a change in pull request #1021:
URL: https://github.com/apache/arrow-rs/pull/1021#discussion_r768112403
##########
File path: parquet/src/arrow/record_reader.rs
##########
@@ -107,21 +103,25 @@ impl<T: DataType> RecordReader<T> {
loop {
// Try to find some records from buffers that has been read into
memory
// but not counted as seen records.
- records_read += self.split_records(num_records - records_read)?;
-
- // Since page reader contains complete records, so if we reached
end of a
- // page reader, we should reach the end of a record
- if end_of_column
- && self.values_seen >= self.values_written
- && self.in_middle_of_record
- {
- self.num_records += 1;
- self.num_values = self.values_seen;
- self.in_middle_of_record = false;
- records_read += 1;
+ let (record_count, value_count) =
+ self.count_records(num_records - records_read);
+
+ self.num_records += record_count;
Review comment:
I think this would leave RecordReader in a strange state if
read_one_batch returned an error, as `self.num_values` would have been updated
and not `self.num`? I can't pull `self.num_values` out to match as it is used
by `count_records`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]