0x26res opened a new issue, #41351: URL: https://github.com/apache/arrow/issues/41351
### Describe the bug, including details regarding any error messages, version, and platform. I apologise in advance, this issue is contrived I start with a chunked array of list of struct, for example `pa.list_(pa.struct([pa.field("value", pa.float64())]))`. I construct the chunked array in such a way that **the underlying values of the list are shared among the array chunks.** ```python values = pa.StructArray.from_arrays([pa.array([1, 2, 3, 4, 5, 6, 7])], ["values"]) my_array = pa.chunked_array( [ pa.ListArray.from_arrays( [0, 1, 2, 3], values, ), pa.ListArray.from_arrays( [3, 4, 5, 6], values, ), ] ) ``` I then try to flatten/explode the list array: ```python flatten = pc.list_flatten(my_array) ``` And create a table from the flatten array chunks: ``` table = pa.Table.from_batches( (pa.RecordBatch.from_struct_array(chunk) for chunk in flatten.iterchunks()), ) ``` I then add a column to the table and call `to_batches` and it causes a seg fault: ``` table.append_column("name", pa.repeat("foo", len(table))).to_batches() ``` One thing I've noticed is that the first chunk of the `flatten` array has got a wrong `str` representation: ``` assert ( str(pa.RecordBatch.from_struct_array(flatten.chunks[0])) == "pyarrow.RecordBatch\nvalues: int64\n----\nvalues: [1,2,3,4,5,6,7]" ) ``` It should show `[1,2,3]` Full example: ``` import pyarrow as pa import pyarrow.compute as pc pa.list_(pa.struct([pa.field("value", pa.float64())])) def test_wrong(): values = pa.StructArray.from_arrays([pa.array([1, 2, 3, 4, 5, 6, 7])], ["values"]) my_array = pa.chunked_array( [ pa.ListArray.from_arrays( [0, 1, 2, 3], values, ), pa.ListArray.from_arrays( [3, 4, 5, 6], values, ), ] ) flatten = pc.list_flatten(my_array) assert flatten.to_pylist() == [ {"values": 1}, {"values": 2}, {"values": 3}, {"values": 4}, {"values": 5}, {"values": 6}, ] assert pa.RecordBatch.from_struct_array(flatten.chunks[0]).to_pylist() == [ {"values": 1}, {"values": 2}, {"values": 3}, ] assert ( str(pa.RecordBatch.from_struct_array(flatten.chunks[0])) == "pyarrow.RecordBatch\nvalues: int64\n----\nvalues: [1,2,3,4,5,6,7]" ) pa.Table.from_batches([pa.RecordBatch.from_struct_array(flatten.chunks[0])]) table = pa.Table.from_batches( (pa.RecordBatch.from_struct_array(chunk) for chunk in flatten.iterchunks()), ) table = table.append_column("name", pa.repeat("foo", len(table))) table.to_batches() ``` A bit of context: - tested with `pyarrow==16.0.0` - this came up when exploding and filtering some data coming from parquet - The example doesn't work if you omit the last element of the underlying values (`7`). ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org