emkornfield commented on code in PR #14603:
URL: https://github.com/apache/arrow/pull/14603#discussion_r1030826876
##########
cpp/src/parquet/column_reader.cc:
##########
@@ -386,6 +386,57 @@ std::shared_ptr<Page> SerializedPageReader::NextPage() {
throw ParquetException("Invalid page header");
}
+ // Do some checks before trying to decrypt and/or decompress the page.
+ // Also skip the page if skip_page_callback_ is set and returns true.
+ const PageType::type page_type = LoadEnumSafe(¤t_page_header_.type);
+ EncodedStatistics page_statistics;
+ if (page_type == PageType::DATA_PAGE) {
+ const format::DataPageHeader& header =
current_page_header_.data_page_header;
+ if (header.num_values < 0) {
+ throw ParquetException("Invalid page header (negative number of
values)");
+ }
+ page_statistics = ExtractStatsFromHeader(header);
Review Comment:
maybe just move this to the usage spots. At the very least, we should
probably only do the conversion if skip_page_callback_ is present?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]