[PR] GH-39780: [Python][Parquet] Support hashing for FileMetaData and ParquetSchema [arrow]

via GitHub Wed, 24 Jan 2024 05:30:45 -0800


milesgranger opened a new pull request, #39781:
URL: https://github.com/apache/arrow/pull/39781


   I think the hash, especially for `FileMetaData` could be better, maybe just 
use return of `__repr__`, even though that won't include row group info?
   
   ### Rationale for this change
   
   Helpful for dependent projects. 
   
   ### What changes are included in this PR?
   
   Impl `__hash__` for `ParquetSchema` and `FileMetaData`
   
   ### Are these changes tested?
   
   Yes
   
   ### Are there any user-facing changes?
   
   Supports hashing metadata:
   
   ```python
   In [1]: import pyarrow.parquet as pq
   
   In [2]: f = pq.ParquetFile('test.parquet')
   
   In [3]: hash(f.metadata)
   Out[3]: 4816453453708427907
   
   In [4]: hash(f.metadata.schema)
   Out[4]: 2300988959078172540
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] GH-39780: [Python][Parquet] Support hashing for FileMetaData and ParquetSchema [arrow]

Reply via email to