etseidl commented on code in PR #9787:
URL: https://github.com/apache/arrow-rs/pull/9787#discussion_r3119880806
##########
parquet/src/encodings/decoding.rs:
##########
@@ -847,6 +847,30 @@ where
self.values_left -= 1;
}
+ // Terminal skip: caller is discarding all remaining values on this
page.
+ // last_value will never be read again, so we can use O(1) arithmetic
+ // skips (BitReader::skip) instead of decoding through get_batch.
+ let terminal = to_skip >= self.values_left + skip;
Review Comment:
I think this line and the following block can move before the preceding
block...then this test can just be `to_skip >= self.values_left`.
##########
parquet/src/encodings/decoding.rs:
##########
@@ -847,6 +847,30 @@ where
self.values_left -= 1;
}
+ // Terminal skip: caller is discarding all remaining values on this
page.
+ // last_value will never be read again, so we can use O(1) arithmetic
+ // skips (BitReader::skip) instead of decoding through get_batch.
+ let terminal = to_skip >= self.values_left + skip;
+
+ if terminal {
+ while skip < to_skip {
Review Comment:
Since we're skipping the entire page, I think we can skip all of this logic
and simply set `self.values_left` to 0. The bit reader is populated with a
single page worth of data, so it's safe to not run through the state machine
here.
This suggestion messes with one of the unit tests, but the test can be
modified to not skip the entire page and still trigger the expected error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]