0x26res opened a new issue, #41351:
URL: https://github.com/apache/arrow/issues/41351

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I apologise in advance, this issue is contrived
   
   I start with a chunked array of list of struct, for example 
`pa.list_(pa.struct([pa.field("value", pa.float64())]))`. 
   
   I construct the chunked array in such a way that **the underlying values of 
the list are shared among the array chunks.**
   
   ```python
   values = pa.StructArray.from_arrays([pa.array([1, 2, 3, 4, 5, 6, 7])], 
["values"])
   
   my_array = pa.chunked_array(
       [
           pa.ListArray.from_arrays(
               [0, 1, 2, 3],
               values,
           ),
           pa.ListArray.from_arrays(
               [3, 4, 5, 6],
               values,
           ),
       ]
   )
   ``` 
   
   I then try to flatten/explode the list array:
   ```python
   flatten = pc.list_flatten(my_array)
   ```
   
   And create a table from the flatten array chunks:
   
   ```
   table = pa.Table.from_batches(
           (pa.RecordBatch.from_struct_array(chunk) for chunk in 
flatten.iterchunks()),
       )
   ```
   
   I then add a column to the table and call `to_batches` and it causes a seg 
fault:
   
   ```
   table.append_column("name", pa.repeat("foo", len(table))).to_batches()
   ```
   
   One thing I've noticed is that the first chunk of the `flatten` array has 
got a wrong `str` representation:
   ```
   assert (
       str(pa.RecordBatch.from_struct_array(flatten.chunks[0]))
       == "pyarrow.RecordBatch\nvalues: int64\n----\nvalues: [1,2,3,4,5,6,7]"
   )
   ``` 
   It should show `[1,2,3]`
   
   
   Full example:
   
   ```
   import pyarrow as pa
   import pyarrow.compute as pc
   
   pa.list_(pa.struct([pa.field("value", pa.float64())]))
   
   
   def test_wrong():
       values = pa.StructArray.from_arrays([pa.array([1, 2, 3, 4, 5, 6, 7])], 
["values"])
   
       my_array = pa.chunked_array(
           [
               pa.ListArray.from_arrays(
                   [0, 1, 2, 3],
                   values,
               ),
               pa.ListArray.from_arrays(
                   [3, 4, 5, 6],
                   values,
               ),
           ]
       )
   
       flatten = pc.list_flatten(my_array)
       assert flatten.to_pylist() == [
           {"values": 1},
           {"values": 2},
           {"values": 3},
           {"values": 4},
           {"values": 5},
           {"values": 6},
       ]
   
       assert pa.RecordBatch.from_struct_array(flatten.chunks[0]).to_pylist() 
== [
           {"values": 1},
           {"values": 2},
           {"values": 3},
       ]
       assert (
           str(pa.RecordBatch.from_struct_array(flatten.chunks[0]))
           == "pyarrow.RecordBatch\nvalues: int64\n----\nvalues: 
[1,2,3,4,5,6,7]"
       )
   
       
pa.Table.from_batches([pa.RecordBatch.from_struct_array(flatten.chunks[0])])
   
       table = pa.Table.from_batches(
           (pa.RecordBatch.from_struct_array(chunk) for chunk in 
flatten.iterchunks()),
       )
       table = table.append_column("name", pa.repeat("foo", len(table)))
       table.to_batches()
   
   ```
   
   A bit of context:
   - tested with `pyarrow==16.0.0`
   - this came up when exploding and filtering some data coming from parquet
   - The example doesn't work if you omit the last element of the underlying 
values (`7`).
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to