frazar commented on PR #38360: URL: https://github.com/apache/arrow/pull/38360#issuecomment-1786107127
Added even more logs [in this branch](https://github.com/frazar/arrow/tree/parquet/python-support-crc-NEW-logs), and got a surprising result: ``` dataset: read_options_args: {'coerce_int96_timestamp_unit': None} scan_args {'pre_buffer': True, 'thrift_string_size_limit': None, 'thrift_container_size_limit': None, 'page_checksum_verification': True} read_options: None default_fragment_scan_options: None build scanOptions with {'pre_buffer': True, 'thrift_string_size_limit': None, 'thrift_container_size_limit': None, 'page_checksum_verification': True} set_page_checksum_verification() called with: check_crc=true <--- Here the C++ setter is called with argument true page_checksum_verification_ is now: true page_checksum_verification() called, returning: false <--- Here the C++ getter is called, but returns false! Open with crc: false current page type is: 2, isset crc is: true page_checksum_verification() called, returning: false properties_.page_checksum_verification(): false current_page_header_.__isset.crc: true PageCanUseChecksum(page_type): true page_checksum_verification() called, returning: false current page type is: 0, isset crc is: true page_checksum_verification() called, returning: false properties_.page_checksum_verification(): false current_page_header_.__isset.crc: true PageCanUseChecksum(page_type): true page_checksum_verification() called, returning: false ``` I see 2 possible explanations: - Either something is zero-ing `page_checksum_verification_` without using the setter, - Or we are looking at methods call for two different instances -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
