tustvold commented on code in PR #1998:
URL: https://github.com/apache/arrow-rs/pull/1998#discussion_r913752851


##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -202,6 +203,37 @@ where
         Ok(records_read)
     }
 
+    /// Try to skip the next `num_records` rows
+    ///
+    /// # Returns
+    ///
+    /// Number of records skipped
+    pub fn skip_records(&mut self, num_records: usize) -> Result<usize> {
+        // First need to clear the buffer
+        let (buffered_records, buffered_values) = 
self.count_records(num_records);
+        self.num_records += buffered_records;
+        self.num_values += buffered_values;
+
+        self.consume_def_levels();
+        self.consume_rep_levels();
+        self.consume_record_data();

Review Comment:
   > This is a part of skip, we need to read the rp ,dp to skip some records in 
the page(maybe have been readed or never readed ).
   
   Yes, this is just to consume the data that has been read to the internal 
buffers of RecordReader if any
   
   > This also part of skip, remaining > 0, I think this we skip start at a new 
page
   
   Not necessarily, the only thing RecordReader needs to handle is skipping any 
data that has already been read from ColumnReader into its own buffers. It can 
then delegate to ColumnReader to skip the remaining rows, with no requirement 
that this is done at a page boundary - ColumnReader must be able to handle any 
case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to