milesgranger commented on code in PR #39781:
URL: https://github.com/apache/arrow/pull/39781#discussion_r1485993573


##########
python/pyarrow/_parquet.pyx:
##########
@@ -1071,6 +1083,11 @@ cdef class ParquetSchema(_Weakrefable):
     def __getitem__(self, i):
         return self.column(i)
 
+    def __hash__(self):
+        return hash(
+            (object.__hash__(self), frombytes(self.schema.ToString(), 
safe=True))

Review Comment:
   Addressed in 
https://github.com/apache/arrow/pull/39781/commits/45059380531161759fe093281ff97988c9207bca



##########
python/pyarrow/tests/parquet/test_metadata.py:
##########
@@ -499,6 +499,24 @@ def test_multi_dataset_metadata(tempdir):
     assert md['serialized_size'] > 0
 
 
+def test_metadata_hashing(tempdir):
+    path1 = str(tempdir / "metadata1")
+    schema1 = pa.schema([("a", "int64"), ("b", "float64")])
+    pq.write_metadata(schema1, path1)
+    parquet_meta1 = pq.read_metadata(path1)
+
+    path2 = str(tempdir / "metadata2")
+    schema2 = pa.schema([("a", "int64"), ("b", "float32")])
+    pq.write_metadata(schema2, path2)
+    parquet_meta2 = pq.read_metadata(path2)
+
+    # Deterministic
+    assert hash(parquet_meta1) == hash(parquet_meta1)

Review Comment:
   Done in 
https://github.com/apache/arrow/pull/39781/commits/45059380531161759fe093281ff97988c9207bca,
 thanks. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to