mosalx opened a new issue, #35360:
URL: https://github.com/apache/arrow/issues/35360

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   # Summary
   When a pyarrow `ListArray` or `FixedSizeListArray` has a struct type, it is 
possible to run into a condition when two equal scalars have different hash 
values. It violates the contract for python hash function stating "The only 
required property is that objects which compare equal have the same hash value"
   https://docs.python.org/3/reference/datamodel.html#object.__hash__
   
   Below is the smallest reproducible example that demonstrates this issue. 
This example is for `FixedSizeListArray` but it affects `ListSizeArray` too.
   
   # Environment
   Windows 10
   python=3.11.2
   pyarrow=11.0.0
   
   # Details
   ```python
   import pyarrow as pa
   
   # initial array
   _type = pa.list_(pa.struct([('a', pa.int32())]), list_size=1)
   array = pa.array([[{'a': 1}], [{'a': 1}], [{'a': 1}], None], type=_type)
   
   # make a deep copy of the last two elements. This involves copying all array 
buffers 
   # and truncating unused bytes due to array offset. For simplicity, I am not 
copying all buffers
   # (`field` array is not copied). This step was omitted to keep the example 
small
   chunk = array[2:]
   child = chunk.values[chunk.offset:]  # StructArray
   field = child.field('a')  # Int32Array
   
   # create a copy of `child`. Validity buffer could be set to None, it would 
not change the outcome
   validity_buffer_child = pa.array([True, False, False, False, False, False, 
False, False]).buffers()[1]
   child_copy = type(child).from_buffers(child.type, length=len(child), 
                                         buffers=[validity_buffer_child], 
                                         children=[field])
   assert child_copy.equals(child)
   
   # create a copy of `chunk` using `child_copy` made above
   validity_buffer_chunk = pa.array([True, False, False, False, False, False, 
False, False]).buffers()[1]
   chunk_copy = pa.FixedSizeListArray.from_buffers(type=chunk.type, 
length=len(chunk), 
                                                   
buffers=[validity_buffer_chunk], 
                                                   children=[child_copy])
   assert chunk_copy.equals(chunk)
   ```
   
   Now we have two equal arrays, where the first element is valid (not-null).
   Equality check for the first element passes
   ```python
   assert chunk_copy[0] == chunk[0]  # Ok
   ```
   
   But their hash values are different
   ```python
   assert hash(chunk_copy[0]) == hash(chunk[0])  # AssertionError
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to