XiangpengHao commented on code in PR #6945: URL: https://github.com/apache/arrow-rs/pull/6945#discussion_r1905635907
########## parquet/src/file/serialized_reader.rs: ########## @@ -568,6 +568,62 @@ impl<R: ChunkReader> SerializedPageReader<R> { physical_type: meta.column_type(), }) } + + /// Similar to `peek_next_page`, but returns the offset of the next page instead of the page metadata. + /// Unlike page metadata, an offset can uniquely identify a page. + /// + /// This is used when we need to read parquet with row-filter, and we don't want to decompress the page twice. + /// This function allows us to check if the next page is being cached or read previously. + pub fn peek_next_page_offset(&mut self) -> Result<Option<usize>> { + match &mut self.state { + SerializedPageReaderState::Values { + offset, + remaining_bytes, + next_page_header, + } => { + loop { + if *remaining_bytes == 0 { Review Comment: > has so much duplication with peek_next_page Agree, I tried to make `peek_next_page` to return an offset as well, but has no luck to easily do it. > in a different impl block than peek_next_page I think it's because `peek_next_page` is in `PageReader` trait -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org