westonpace commented on issue #34639:
URL: https://github.com/apache/arrow/issues/34639#issuecomment-1536510963
I've reopened this so we can verify but I think it is actually doing the
right thing. Although I think there is another bug in to_struct_array and
to_pandas (:face_exhaling:)
```
> pa.RecordBatch.from_struct_array(standard)
```
This will give you a record batch that has length 1 with two child arrays
that each have length 2. This is allowed because it lets us use zero-copy.
```
>>> x = pa.RecordBatch.from_struct_array(standard)
>>> print(x) # Sadly, we don't print the contents here
pyarrow.RecordBatch
col1: double
col2: string
>>> print(x.num_rows) # This is correct
1
>>> print(x.column(0)) # This is arguably correct but misleading
[
1,
2
]
>>> print(x.to_pylist()) # This is correct
[{'col1': 1.0, 'col2': 'a'}]
>>> print(x.to_struct_array()) # This is wrong
-- is_valid: all not null
-- child 0 type: double
[
1,
2
]
-- child 1 type: string
[
"a",
"b"
]
>>> print(x.to_pandas()) # this is also wrong
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/array.pxi", line 852, in
pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/table.pxi", line 2506, in pyarrow.lib.RecordBatch._to_pandas
File "pyarrow/table.pxi", line 4075, in pyarrow.lib.Table._to_pandas
File "/home/pace/dev/arrow/python/pyarrow/pandas_compat.py", line 823, in
table_to_blockmanager
return BlockManager(blocks, axes)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/pace/miniconda3/envs/conbench3/lib/python3.11/site-packages/pandas/core/internals/managers.py",
line 1040, in __init__
self._verify_integrity()
File
"/home/pace/miniconda3/envs/conbench3/lib/python3.11/site-packages/pandas/core/internals/managers.py",
line 1047, in _verify_integrity
raise construction_error(tot_items, block.shape[1:], self.axes)
ValueError: Shape of passed values is (2, 2), indices imply (1, 2)
```
I will open up two new issues for to_struct_array and to_pandas. Arguably,
we should also modify `to_batches` to push "short lengths" into the arrays
themselves. I'll have to ask on the ML if it's legal for a record batch and
its arrays to have different lengths.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]