Ted-Jiang commented on code in PR #1998:
URL: https://github.com/apache/arrow-rs/pull/1998#discussion_r913362921
##########
parquet/src/arrow/record_reader/mod.rs:
##########
@@ -202,6 +203,37 @@ where
Ok(records_read)
}
+ /// Try to skip the next `num_records` rows
+ ///
+ /// # Returns
+ ///
+ /// Number of records skipped
+ pub fn skip_records(&mut self, num_records: usize) -> Result<usize> {
+ // First need to clear the buffer
+ let (buffered_records, buffered_values) =
self.count_records(num_records);
+ self.num_records += buffered_records;
+ self.num_values += buffered_values;
+
+ self.consume_def_levels();
+ self.consume_rep_levels();
+ self.consume_record_data();
Review Comment:
👍 nice write up ! Save me some time 😄!
So, i got it. More specific details to ask:
This is a part of skip, we need to read the `rp` ,`dp` to skip some records
in the page(maybe have been readed or never readed ).
```
let (buffered_records, buffered_values) = self.count_records(num_records);
self.num_records += buffered_records;
self.num_values += buffered_values;
self.consume_def_levels();
self.consume_rep_levels();
self.consume_record_data();
self.consume_bitmap();
self.reset();
let remaining = buffered_records - num_records;
```
This also part of skip, `remaining > 0`, I think this we skip start at a new
page
```
if remaining == 0 {
return Ok(buffered_records);
}
let skipped = match self.column_reader.as_mut() {
Some(column_reader) => column_reader.skip_records(remaining)?,
None => 0,
};
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]