etseidl commented on code in PR #9769:
URL: https://github.com/apache/arrow-rs/pull/9769#discussion_r3113974576


##########
parquet/src/encodings/decoding.rs:
##########
@@ -847,6 +882,30 @@ where
             self.values_left -= 1;
         }
 
+        // Terminal skip: caller is discarding all remaining values on this 
page.
+        // last_value will never be read again, so we can use O(1) arithmetic
+        // skips (BitReader::skip) instead of decoding through get_batch.
+        let terminal = to_skip >= self.values_left + skip;

Review Comment:
   I think all of the "terminal" logic can move before taking `first_value`. If 
`terminal` is true, we don't even need to take `first_value`.



##########
parquet/src/encodings/decoding.rs:
##########
@@ -862,55 +921,191 @@ where
             let bit_width = self.mini_block_bit_widths[self.mini_block_idx] as 
usize;
             self.check_bit_width(bit_width)?;
             let mini_block_to_skip = self.mini_block_remaining.min(to_skip - 
skip);
-            let mini_block_should_skip = mini_block_to_skip;
-
-            let skip_count = self
-                .bit_reader
-                .get_batch(&mut skip_buffer[0..mini_block_to_skip], bit_width);
 
-            if skip_count != mini_block_to_skip {
-                return Err(general_err!(
-                    "Expected to skip {} values from mini block got {}.",
-                    mini_block_batch_size,
-                    skip_count
-                ));
-            }
-
-            // see commentary in self.get() above regarding optimizations
             let min_delta = self.min_delta.as_i64()?;
             if bit_width == 0 {
-                // if min_delta == 0, there's nothing to do. self.last_value 
is unchanged
+                // All remainders are zero: every delta equals min_delta 
exactly.
+                // Advance last_value by n * min_delta with no bit reads.

Review Comment:
   The new comments here do not address the `min_delta == 0` case.



##########
parquet/src/encodings/decoding.rs:
##########
@@ -847,6 +882,30 @@ where
             self.values_left -= 1;
         }
 
+        // Terminal skip: caller is discarding all remaining values on this 
page.
+        // last_value will never be read again, so we can use O(1) arithmetic
+        // skips (BitReader::skip) instead of decoding through get_batch.
+        let terminal = to_skip >= self.values_left + skip;
+
+        if terminal {
+            while skip < to_skip {

Review Comment:
   I think this can simply set `self.values_left` to 0, and perhaps take 
`first_value` just in case. The only reason for stepping through the headers is 
to do validation, but if we're skipping anyway, I think we can just ignore 
invalid data.



##########
parquet/src/encodings/decoding.rs:
##########
@@ -862,55 +921,191 @@ where
             let bit_width = self.mini_block_bit_widths[self.mini_block_idx] as 
usize;
             self.check_bit_width(bit_width)?;
             let mini_block_to_skip = self.mini_block_remaining.min(to_skip - 
skip);
-            let mini_block_should_skip = mini_block_to_skip;
-
-            let skip_count = self
-                .bit_reader
-                .get_batch(&mut skip_buffer[0..mini_block_to_skip], bit_width);
 
-            if skip_count != mini_block_to_skip {
-                return Err(general_err!(
-                    "Expected to skip {} values from mini block got {}.",
-                    mini_block_batch_size,
-                    skip_count
-                ));
-            }
-
-            // see commentary in self.get() above regarding optimizations

Review Comment:
   Not sure why this comment was dropped, please restore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to