etseidl commented on code in PR #9787:
URL: https://github.com/apache/arrow-rs/pull/9787#discussion_r3119880806


##########
parquet/src/encodings/decoding.rs:
##########
@@ -847,6 +847,30 @@ where
             self.values_left -= 1;
         }
 
+        // Terminal skip: caller is discarding all remaining values on this 
page.
+        // last_value will never be read again, so we can use O(1) arithmetic
+        // skips (BitReader::skip) instead of decoding through get_batch.
+        let terminal = to_skip >= self.values_left + skip;

Review Comment:
   I think this line and the following block can move before the preceding 
block...then this test can just be `to_skip >= self.values_left`.



##########
parquet/src/encodings/decoding.rs:
##########
@@ -847,6 +847,30 @@ where
             self.values_left -= 1;
         }
 
+        // Terminal skip: caller is discarding all remaining values on this 
page.
+        // last_value will never be read again, so we can use O(1) arithmetic
+        // skips (BitReader::skip) instead of decoding through get_batch.
+        let terminal = to_skip >= self.values_left + skip;
+
+        if terminal {
+            while skip < to_skip {

Review Comment:
   Since we're skipping the entire page, I think we can skip all of this logic 
and simply set `self.values_left` to 0. The bit reader is populated with a 
single page worth of data, so it's safe to not run through the state machine 
here.
   
   This suggestion messes with one of the unit tests, but the test can be 
modified to not skip the entire page and still trigger the expected error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to