tustvold commented on code in PR #3848:
URL: https://github.com/apache/arrow-rs/pull/3848#discussion_r1134671821
##########
parquet/src/arrow/arrow_reader/selection.rs:
##########
@@ -372,29 +371,63 @@ impl RowSelection {
self
}
+ /// Applies an offset to this [`RowSelection`], skipping the first
`offset` selected rows
+ pub(crate) fn offset(mut self, offset: usize) -> Self {
+ if offset == 0 {
+ return self;
+ }
+
+ let mut selected_count = 0;
+ let mut skipped_count = 0;
+
+ // Find the index where the selector exceeds the row count
+ let find = self
+ .selectors
+ .iter()
+ .position(|selector| match selector.skip {
+ true => {
+ skipped_count += selector.row_count;
+ false
+ }
+ false => {
+ selected_count += selector.row_count;
+ selected_count > offset
+ }
+ });
+
+ let split_idx = match find {
+ Some(idx) => idx,
+ None => {
+ self.selectors.clear();
+ return self;
+ }
+ };
+
+ let mut selectors = Vec::with_capacity(self.selectors.len() -
split_idx + 1);
+ selectors.push(RowSelector::skip(skipped_count + offset));
+ selectors.push(RowSelector::select(selected_count - offset));
Review Comment:
Yes, that is the correct behaviour no, the offset is an offset into the
selected rows? So in this case you would expect the last 4 rows, skipping the
first 2 that were selected originally?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]