frazar commented on code in PR #38360:
URL: https://github.com/apache/arrow/pull/38360#discussion_r1375475701


##########
python/pyarrow/parquet/core.py:
##########
@@ -3004,6 +3025,7 @@ def read_table(source, *, columns=None, use_threads=True, 
metadata=None,
                 decryption_properties=decryption_properties,
                 thrift_string_size_limit=thrift_string_size_limit,
                 thrift_container_size_limit=thrift_container_size_limit,
+                page_checksum_verification=page_checksum_verification,

Review Comment:
   Thank you! Added the following lines for C++ logging:
   
   ```diff
   diff --git a/cpp/src/parquet/column_reader.cc 
b/cpp/src/parquet/column_reader.cc
   index ecc48811e..3cd5e35c0 100644
   --- a/cpp/src/parquet/column_reader.cc
   +++ b/cpp/src/parquet/column_reader.cc
   @@ -491,11 +491,20 @@ std::shared_ptr<Page> SerializedPageReader::NextPage() 
{
   
        const PageType::type page_type = 
LoadEnumSafe(&current_page_header_.type);
   
   +    std::cout << "current page type is: " << static_cast<int>(page_type) << 
std::boolalpha
   +            << ", isset crc is: " << current_page_header_.__isset.crc << 
std::endl;
   +
   +    std::cout << "properties_.page_checksum_verification(): " << 
properties_.page_checksum_verification() << std::endl;
   +    std::cout << "current_page_header_.__isset.crc: " << 
current_page_header_.__isset.crc << std::endl;
   +    std::cout << "PageCanUseChecksum(page_type): " << 
PageCanUseChecksum(page_type) << std::endl;
   
        if (properties_.page_checksum_verification() && 
current_page_header_.__isset.crc &&
            PageCanUseChecksum(page_type)) {
          // verify crc
   ```    
   
   This to understand why execution does not go into the `if` statement in the 
last line.
   
   When running the tests, I get the following logs:
   ```
   current page type is: 2, isset crc is: true
   properties_.page_checksum_verification(): false
   current_page_header_.__isset.crc: true
   PageCanUseChecksum(page_type): true
   current page type is: 0, isset crc is: true
   properties_.page_checksum_verification(): false
   current_page_header_.__isset.crc: true
   PageCanUseChecksum(page_type): true
   ```
   
   This means that the reason for failure is that 
`properties_.page_checksum_verification()` returns `false` rather than `true`! 
   Still not sure why though..
   
     



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to