tustvold commented on code in PR #1998:
URL: https://github.com/apache/arrow-rs/pull/1998#discussion_r913752851
##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -202,6 +203,37 @@ where
Ok(records_read)
}
+ /// Try to skip the next `num_records` rows
+ ///
+ /// # Returns
+ ///
+ /// Number of records skipped
+ pub fn skip_records(&mut self, num_records: usize) -> Result<usize> {
+ // First need to clear the buffer
+ let (buffered_records, buffered_values) =
self.count_records(num_records);
+ self.num_records += buffered_records;
+ self.num_values += buffered_values;
+
+ self.consume_def_levels();
+ self.consume_rep_levels();
+ self.consume_record_data();
Review Comment:
> This is a part of skip, we need to read the rp ,dp to skip some records in
the page(maybe have been readed or never readed ).
Yes, this is just to consume the data that has been read to the internal
buffers of RecordReader if any
> This also part of skip, remaining > 0, I think this we skip start at a new
page
Not necessarily, the only thing RecordReader needs to handle is skipping any
data that has already been read from ColumnReader into its own buffers. It can
then delegate to ColumnReader to skip the remaining rows, with no requirement
that this is done at a page boundary - ColumnReader must be able to handle any
case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]