[GitHub] [arrow-rs] tustvold commented on a change in pull request #1021: Simplify parquet arror `RecordReader`

GitBox Mon, 13 Dec 2021 13:22:24 -0800


tustvold commented on a change in pull request #1021:
URL: https://github.com/apache/arrow-rs/pull/1021#discussion_r768131558




##########
File path: parquet/src/arrow/record_reader.rs
##########
@@ -107,21 +103,25 @@ impl<T: DataType> RecordReader<T> {
         loop {
             // Try to find some records from buffers that has been read into 
memory
             // but not counted as seen records.
-            records_read += self.split_records(num_records - records_read)?;
-
-            // Since page reader contains complete records, so if we reached 
end of a
-            // page reader, we should reach the end of a record
-            if end_of_column
-                && self.values_seen >= self.values_written
-                && self.in_middle_of_record
-            {
-                self.num_records += 1;
-                self.num_values = self.values_seen;
-                self.in_middle_of_record = false;
-                records_read += 1;
+            let (record_count, value_count) =
+                self.count_records(num_records - records_read);
+
+            self.num_records += record_count;
+            self.num_values += value_count;
+            records_read += record_count;
+
+            if records_read == num_records {
+                break;
             }
 
-            if (records_read >= num_records) || end_of_column {
+            if end_of_column {
+                // Since page reader contains complete records, if we reached 
end of a

Review comment:
       See comment below, page reader is a column chunk. So this is effectively 
saying that a record can't be split across row groups, which I think is 
guaranteed?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1021: Simplify parquet arror `RecordReader`

Reply via email to