AlenkaF commented on PR #38360:
URL: https://github.com/apache/arrow/pull/38360#issuecomment-1801603020

   > Since I am not familiar with all the different formats that include the 
checksum, could you provide a comprehensive list so that I can create a test 
for each?
   
   Making a list from the PyArrow side of all possible `pyarrow.parquet` APIs 
to read a Parquet file (equal to all the possible formats in this PR where 
`page_checksum_verification` is added) the list would include:
   
   - ParquetDatasetV2 (works correctly)
   - ParquetDataset (legacy API, not sure we need the checksum verification 
here as it is deprecated)
     - 
https://github.com/apache/arrow/blob/47222b2794c6c804ca3a351cc6d8544d952365ba/python/pyarrow/parquet/core.py#L1846
   - ParquetFile (does not raise an error but I think it should as it is using 
`ParquetReader` to open the file)
     - 
https://github.com/apache/arrow/blob/47222b2794c6c804ca3a351cc6d8544d952365ba/python/pyarrow/parquet/core.py#L333
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to