benedikt-grl opened a new issue, #48781:
URL: https://github.com/apache/arrow/issues/48781
### Describe the bug, including details regarding any error messages,
version, and platform.
When I try to create a RecordBatch from a list with large objects,
RecordBatch.from_pylist raises a *TypeError: Cannot convert
pyarrow.lib.ChunkedArray to pyarrow.lib.Array*.
MWE:
```python
import pyarrow as pa
import numpy as np
# Create a random array of shape [3, 720, 1280]
rng = np.random.default_rng(42)
image = rng.integers(low=0, high=255, size=(3, 720, 1280))
# Wrap into dict
row = {
"image": {
"data": image.tobytes(),
"shape": image.shape,
}
}
# Define schema
schema = pa.schema({
"image": pa.struct({"data": pa.binary(), "shape": pa.list_(pa.uint16(),
3)})
})
# Convert to record batch
num_rows = 98
pylist = [row] * num_rows
batch = pa.RecordBatch.from_pylist(pylist, schema=schema)
```
When `num_rows` is reduced to `97`, the example above runs without any error.
I suspect the issue is related to the size in bytes of the pylist. Each
image has 3 * 720 * 1280 * 8 bytes.
98 images have 2,167,603,200 bytes.
97 images have 2,145,484,800 bytes.
2^31 is 2,147,483,648 which is right in between these two numbers.
While in this MWE the images consume more bytes than needed, in my use case
I cannot use fewer bytes.
Is there a simple way to solve this issue?
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]