mosalx opened a new issue, #35360: URL: https://github.com/apache/arrow/issues/35360
### Describe the bug, including details regarding any error messages, version, and platform. # Summary When a pyarrow `ListArray` or `FixedSizeListArray` has a struct type, it is possible to run into a condition when two equal scalars have different hash values. It violates the contract for python hash function stating "The only required property is that objects which compare equal have the same hash value" https://docs.python.org/3/reference/datamodel.html#object.__hash__ Below is the smallest reproducible example that demonstrates this issue. This example is for `FixedSizeListArray` but it affects `ListSizeArray` too. # Environment Windows 10 python=3.11.2 pyarrow=11.0.0 # Details ```python import pyarrow as pa # initial array _type = pa.list_(pa.struct([('a', pa.int32())]), list_size=1) array = pa.array([[{'a': 1}], [{'a': 1}], [{'a': 1}], None], type=_type) # make a deep copy of the last two elements. This involves copying all array buffers # and truncating unused bytes due to array offset. For simplicity, I am not copying all buffers # (`field` array is not copied). This step was omitted to keep the example small chunk = array[2:] child = chunk.values[chunk.offset:] # StructArray field = child.field('a') # Int32Array # create a copy of `child`. Validity buffer could be set to None, it would not change the outcome validity_buffer_child = pa.array([True, False, False, False, False, False, False, False]).buffers()[1] child_copy = type(child).from_buffers(child.type, length=len(child), buffers=[validity_buffer_child], children=[field]) assert child_copy.equals(child) # create a copy of `chunk` using `child_copy` made above validity_buffer_chunk = pa.array([True, False, False, False, False, False, False, False]).buffers()[1] chunk_copy = pa.FixedSizeListArray.from_buffers(type=chunk.type, length=len(chunk), buffers=[validity_buffer_chunk], children=[child_copy]) assert chunk_copy.equals(chunk) ``` Now we have two equal arrays, where the first element is valid (not-null). Equality check for the first element passes ```python assert chunk_copy[0] == chunk[0] # Ok ``` But their hash values are different ```python assert hash(chunk_copy[0]) == hash(chunk[0]) # AssertionError ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
