XiangpengHao commented on code in PR #6945:
URL: https://github.com/apache/arrow-rs/pull/6945#discussion_r1905635907


##########
parquet/src/file/serialized_reader.rs:
##########
@@ -568,6 +568,62 @@ impl<R: ChunkReader> SerializedPageReader<R> {
             physical_type: meta.column_type(),
         })
     }
+
+    /// Similar to `peek_next_page`, but returns the offset of the next page 
instead of the page metadata.
+    /// Unlike page metadata, an offset can uniquely identify a page.
+    ///
+    /// This is used when we need to read parquet with row-filter, and we 
don't want to decompress the page twice.
+    /// This function allows us to check if the next page is being cached or 
read previously.
+    pub fn peek_next_page_offset(&mut self) -> Result<Option<usize>> {
+        match &mut self.state {
+            SerializedPageReaderState::Values {
+                offset,
+                remaining_bytes,
+                next_page_header,
+            } => {
+                loop {
+                    if *remaining_bytes == 0 {

Review Comment:
   > has so much duplication with peek_next_page 
   
   Agree, I tried to make `peek_next_page` to return an offset as well, but has 
no luck to easily do it.
   
   > in a different impl block than peek_next_page
   
   I think it's because `peek_next_page` is in `PageReader` trait



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to