milesgranger commented on code in PR #39781:
URL: https://github.com/apache/arrow/pull/39781#discussion_r1485993234


##########
python/pyarrow/_parquet.pyx:
##########
@@ -849,6 +849,18 @@ cdef class FileMetaData(_Weakrefable):
         cdef Buffer buffer = sink.getvalue()
         return _reconstruct_filemetadata, (buffer,)
 
+    def __hash__(self):
+        def flatten(obj):

Review Comment:
   Thanks for confirming my suspicion the current approach was abysmal. Just 
wasn't sure _what_ belongs in there. At least schema but seemed like 
filemetadata was a superset in some way of schema, so ought to include a few 
other things, thus added a few more attributes in the followup; please feel 
free to suggest any changes there. :) 
   
   Timings I have for 
https://github.com/apache/arrow/pull/39781/commits/45059380531161759fe093281ff97988c9207bca
 are:
   ```python
   In [1]: import pyarrow.parquet as pq
   
   In [2]: meta = 
pq.read_metadata("../../lineitem-stats-no-index.snappy.parquet")
   
   In [3]: schema = meta.schema.to_arrow_schema()
   
   In [4]: %timeit hash(meta)
   4.34 µs ± 12.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
   
   In [5]: %timeit hash(schema)
   8.34 µs ± 31.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to