benedikt-grl opened a new issue, #48781:
URL: https://github.com/apache/arrow/issues/48781

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   When I try to create a RecordBatch from a list with large objects, 
RecordBatch.from_pylist raises a *TypeError: Cannot convert 
pyarrow.lib.ChunkedArray to pyarrow.lib.Array*.
   
   MWE:
   ```python
   import pyarrow as pa
   import numpy as np
   
   
   # Create a random array of shape [3, 720, 1280]
   rng = np.random.default_rng(42)
   image = rng.integers(low=0, high=255, size=(3, 720, 1280))
   
   # Wrap into dict
   row = {
       "image": {
           "data": image.tobytes(),
           "shape": image.shape,
       }
   }
   
   # Define schema
   schema = pa.schema({
       "image": pa.struct({"data": pa.binary(), "shape": pa.list_(pa.uint16(), 
3)})
   })
   
   # Convert to record batch
   num_rows = 98
   pylist = [row] * num_rows
   batch = pa.RecordBatch.from_pylist(pylist, schema=schema)
   ```
   
   When `num_rows` is reduced to `97`, the example above runs without any error.
   
   I suspect the issue is related to the size in bytes of the pylist. Each 
image has 3 * 720 * 1280 * 8 bytes.
   98 images have 2,167,603,200 bytes.
   97 images have 2,145,484,800 bytes.
   2^31 is 2,147,483,648 which is right in between these two numbers.
   
   While in this MWE the images consume more bytes than needed, in my use case 
I cannot use fewer bytes.
   Is there a simple way to solve this issue?
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to