frazar commented on code in PR #38360:
URL: https://github.com/apache/arrow/pull/38360#discussion_r1375475701
##########
python/pyarrow/parquet/core.py:
##########
@@ -3004,6 +3025,7 @@ def read_table(source, *, columns=None, use_threads=True,
metadata=None,
decryption_properties=decryption_properties,
thrift_string_size_limit=thrift_string_size_limit,
thrift_container_size_limit=thrift_container_size_limit,
+ page_checksum_verification=page_checksum_verification,
Review Comment:
Thank you! Added the following lines for C++ logging:
```diff
diff --git a/cpp/src/parquet/column_reader.cc
b/cpp/src/parquet/column_reader.cc
index ecc48811e..3cd5e35c0 100644
--- a/cpp/src/parquet/column_reader.cc
+++ b/cpp/src/parquet/column_reader.cc
@@ -491,11 +491,20 @@ std::shared_ptr<Page> SerializedPageReader::NextPage()
{
const PageType::type page_type =
LoadEnumSafe(¤t_page_header_.type);
+ std::cout << "current page type is: " << static_cast<int>(page_type) <<
std::boolalpha
+ << ", isset crc is: " << current_page_header_.__isset.crc <<
std::endl;
+
+ std::cout << "properties_.page_checksum_verification(): " <<
properties_.page_checksum_verification() << std::endl;
+ std::cout << "current_page_header_.__isset.crc: " <<
current_page_header_.__isset.crc << std::endl;
+ std::cout << "PageCanUseChecksum(page_type): " <<
PageCanUseChecksum(page_type) << std::endl;
if (properties_.page_checksum_verification() &&
current_page_header_.__isset.crc &&
PageCanUseChecksum(page_type)) {
// verify crc
```
This to understand why execution does not go into the `if` statement in the
last line.
When running the tests, I get the following logs:
```
current page type is: 2, isset crc is: true
properties_.page_checksum_verification(): false
current_page_header_.__isset.crc: true
PageCanUseChecksum(page_type): true
current page type is: 0, isset crc is: true
properties_.page_checksum_verification(): false
current_page_header_.__isset.crc: true
PageCanUseChecksum(page_type): true
```
This means that the reason for failure is that
`properties_.page_checksum_verification()` returns `false` rather than `true`!
Still not sure why though..
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]