This is an automated email from the ASF dual-hosted git repository.
raulcd pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new ef718a7da5 GH-47602: [Python] Make Schema hashable even when it has
metadata (#47601)
ef718a7da5 is described below
commit ef718a7da55e053c92b5ae24dced5b644fe63e52
Author: Jonas Dedden <[email protected]>
AuthorDate: Fri Oct 3 11:10:21 2025 +0200
GH-47602: [Python] Make Schema hashable even when it has metadata (#47601)
### Rationale for this change
In Python, `pyarrow.Schema` before was not hashable when it has `metadata`
set.
```
>>> import pyarrow
>>> schema = pyarrow.schema([], metadata={b"1": b"1"})
>>> hash(schema)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/types.pxi", line 2921, in pyarrow.lib.Schema.__hash__
TypeError: unhashable type: 'dict'
```
This is because the metadata (which is a dict) was tried to be hashed
as-is, which doesn't work.
### What changes are included in this PR?
Slightly change how hashes are computed for Schema, by converting the
`dict[str, str]` to the frozenset of key- and value tuples.
For reference, this is faster than computing the hash of a sorted tuple of
key- and value tuples (https://stackoverflow.com/a/6014481/10070873).
### Are these changes tested?
Yes.
### Are there any user-facing changes?
Besides that `Schema` now correctly is hashable, no.
* GitHub Issue: #47602
Lead-authored-by: Jonas Dedden <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Signed-off-by: Raúl Cumplido <[email protected]>
---
python/pyarrow/tests/test_schema.py | 22 ++++++++++++++++++++++
python/pyarrow/types.pxi | 3 ++-
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/python/pyarrow/tests/test_schema.py
b/python/pyarrow/tests/test_schema.py
index a1197ed2d0..029e14ca16 100644
--- a/python/pyarrow/tests/test_schema.py
+++ b/python/pyarrow/tests/test_schema.py
@@ -482,6 +482,28 @@ def test_schema_set_field():
assert s3.field(0).nullable is False
+def test_schema_hash_metadata():
+ fields = [
+ pa.field("foo", pa.int32()),
+ ]
+
+ schema1 = pa.schema(fields, metadata={b'foo': b'bar'})
+ schema2 = pa.schema(fields, metadata={b'foo': b'bar'})
+ schema3 = pa.schema(fields, metadata={b'foo_different': b'bar'})
+ schema4 = pa.schema(fields, metadata={b'foo': b'bar_different'})
+
+ assert hash(schema1) == hash(schema2)
+ assert hash(schema1) != hash(schema3)
+ assert hash(schema1) != hash(schema4)
+ assert hash(schema3) != hash(schema4)
+
+ schema_empty1 = pa.schema(fields, metadata={})
+ schema_empty2 = pa.schema(fields, metadata=None)
+
+ assert hash(schema_empty1) == hash(schema_empty2)
+ assert hash(schema_empty1) != hash(schema1)
+
+
def test_schema_equals():
fields = [
pa.field('foo', pa.int32()),
diff --git a/python/pyarrow/types.pxi b/python/pyarrow/types.pxi
index 2212240b8b..7d9261cf85 100644
--- a/python/pyarrow/types.pxi
+++ b/python/pyarrow/types.pxi
@@ -2918,7 +2918,8 @@ cdef class Schema(_Weakrefable):
return schema, (list(self), self.metadata)
def __hash__(self):
- return hash((tuple(self), self.metadata))
+ metadata = frozenset(self.metadata.items() if self.metadata else {})
+ return hash((tuple(self), metadata))
def __sizeof__(self):
size = 0