wgtmac commented on code in PR #14603:
URL: https://github.com/apache/arrow/pull/14603#discussion_r1018579993
##########
cpp/src/parquet/column_reader.cc:
##########
@@ -386,6 +397,16 @@ std::shared_ptr<Page> SerializedPageReader::NextPage() {
throw ParquetException("Invalid page header");
}
+ // Once we have the header, we will call the skip_page_call_back_ to
+ // determine if we should be skipping this page. If yes, we will advance the
+ // stream to the next page.
+ if(has_skip_page_callback_) {
Review Comment:
> > What we have done is to let the PageReader be aware of Offset Index
belong to the pages of the RowGroup
>
> Do you have example code here for what you mean?
>
> > I can pick up
[ARROW-10158](https://issues.apache.org/jira/browse/ARROW-10158) to contribute
our implementation.
>
> Is the implementation compatible with the callback approach? If you are
willing to contribute it, it seems like it would be valuable.
I agree with what @fatemehp has said. My proposal is orthogonal (and
compatible) to the current patch. I can contribute the Page Index
implementation and then utilize the Offset Index to skip when available.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]