XiangpengHao commented on code in PR #6945:
URL: https://github.com/apache/arrow-rs/pull/6945#discussion_r1905635907
##########
parquet/src/file/serialized_reader.rs:
##########
@@ -568,6 +568,62 @@ impl<R: ChunkReader> SerializedPageReader<R> {
physical_type: meta.column_type(),
})
}
+
+ /// Similar to `peek_next_page`, but returns the offset of the next page
instead of the page metadata.
+ /// Unlike page metadata, an offset can uniquely identify a page.
+ ///
+ /// This is used when we need to read parquet with row-filter, and we
don't want to decompress the page twice.
+ /// This function allows us to check if the next page is being cached or
read previously.
+ pub fn peek_next_page_offset(&mut self) -> Result<Option<usize>> {
+ match &mut self.state {
+ SerializedPageReaderState::Values {
+ offset,
+ remaining_bytes,
+ next_page_header,
+ } => {
+ loop {
+ if *remaining_bytes == 0 {
Review Comment:
> has so much duplication with peek_next_page
Agree, I tried to make `peek_next_page` to return an offset as well, but has
no luck to easily do it.
> in a different impl block than peek_next_page
I think it's because `peek_next_page` is in `PageReader` trait
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]