This is an automated email from the ASF dual-hosted git repository.

raulcd pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new ef718a7da5 GH-47602: [Python] Make Schema hashable even when it has 
metadata (#47601)
ef718a7da5 is described below

commit ef718a7da55e053c92b5ae24dced5b644fe63e52
Author: Jonas Dedden <[email protected]>
AuthorDate: Fri Oct 3 11:10:21 2025 +0200

    GH-47602: [Python] Make Schema hashable even when it has metadata (#47601)
    
    ### Rationale for this change
    
    In Python, `pyarrow.Schema` before was not hashable when it has `metadata` 
set.
    
    ```
    >>> import pyarrow
    >>> schema = pyarrow.schema([], metadata={b"1": b"1"})
    >>> hash(schema)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "pyarrow/types.pxi", line 2921, in pyarrow.lib.Schema.__hash__
    TypeError: unhashable type: 'dict'
    ```
    
    This is because the metadata (which is a dict) was tried to be hashed 
as-is, which doesn't work.
    
    ### What changes are included in this PR?
    
    Slightly change how hashes are computed for Schema, by converting the 
`dict[str, str]` to the frozenset of key- and value tuples.
    
    For reference, this is faster than computing the hash of a sorted tuple of 
key- and value tuples (https://stackoverflow.com/a/6014481/10070873).
    
    ### Are these changes tested?
    
    Yes.
    
    ### Are there any user-facing changes?
    
    Besides that `Schema` now correctly is hashable, no.
    * GitHub Issue: #47602
    
    Lead-authored-by: Jonas Dedden <[email protected]>
    Co-authored-by: Alenka Frim <[email protected]>
    Signed-off-by: Raúl Cumplido <[email protected]>
---
 python/pyarrow/tests/test_schema.py | 22 ++++++++++++++++++++++
 python/pyarrow/types.pxi            |  3 ++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/python/pyarrow/tests/test_schema.py 
b/python/pyarrow/tests/test_schema.py
index a1197ed2d0..029e14ca16 100644
--- a/python/pyarrow/tests/test_schema.py
+++ b/python/pyarrow/tests/test_schema.py
@@ -482,6 +482,28 @@ def test_schema_set_field():
     assert s3.field(0).nullable is False
 
 
+def test_schema_hash_metadata():
+    fields = [
+        pa.field("foo", pa.int32()),
+    ]
+
+    schema1 = pa.schema(fields, metadata={b'foo': b'bar'})
+    schema2 = pa.schema(fields, metadata={b'foo': b'bar'})
+    schema3 = pa.schema(fields, metadata={b'foo_different': b'bar'})
+    schema4 = pa.schema(fields, metadata={b'foo': b'bar_different'})
+
+    assert hash(schema1) == hash(schema2)
+    assert hash(schema1) != hash(schema3)
+    assert hash(schema1) != hash(schema4)
+    assert hash(schema3) != hash(schema4)
+
+    schema_empty1 = pa.schema(fields, metadata={})
+    schema_empty2 = pa.schema(fields, metadata=None)
+
+    assert hash(schema_empty1) == hash(schema_empty2)
+    assert hash(schema_empty1) != hash(schema1)
+
+
 def test_schema_equals():
     fields = [
         pa.field('foo', pa.int32()),
diff --git a/python/pyarrow/types.pxi b/python/pyarrow/types.pxi
index 2212240b8b..7d9261cf85 100644
--- a/python/pyarrow/types.pxi
+++ b/python/pyarrow/types.pxi
@@ -2918,7 +2918,8 @@ cdef class Schema(_Weakrefable):
         return schema, (list(self), self.metadata)
 
     def __hash__(self):
-        return hash((tuple(self), self.metadata))
+        metadata = frozenset(self.metadata.items() if self.metadata else {})
+        return hash((tuple(self), metadata))
 
     def __sizeof__(self):
         size = 0

Reply via email to