Re: [I] pyarrow.RecordBatch.from_pandas fails on concatenated pandas.DataFrames: TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array [arrow]

via GitHub Tue, 11 Jun 2024 07:51:32 -0700


jorisvandenbossche commented on issue #41936:
URL: https://github.com/apache/arrow/issues/41936#issuecomment-2160965545


   @PatrikBernhard the issue here is that your example pandas DataFrame 
consists of a chunked column (because of the concat step), and a RecordBatch is 
a data structure where each column consists of a single contiguous array.
   
   In pyarrow, that's the difference between a `RecordBatch` and a `Table` 
(RecordBatch being a collection of `Array` objects, and a Table a collection of 
`ChunkedArray` objects).
   
   So you will noticed that `pa.Table.from_pandas(concat_df)` works fine.
   Historically,  pandas DataFrames always had columns that used a single 
non-chunked array under the hood, and that's the reason that 
`RecordBatch.from_pandas` currently does not support that.
   
   I am not entirely sure what the best solution is: keep raising the error 
(but maybe make it more informative or document this behaviour better) because 
people might not expect a copy in this conversion step, or automatically 
converting the chunked array to a contiguous array.
   
   As a comparison, directly constructing a RecordBatch from a ChunkedArray 
gives the same error:
   
   ```python
   In [10]: arr = pa.chunked_array([pa.array([1], pa.int32()), pa.array([2], 
pa.int32())])
   
   In [11]: pa.RecordBatch.from_arrays([arr], names=["col"])
   ...
   TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] pyarrow.RecordBatch.from_pandas fails on concatenated pandas.DataFrames: TypeError: Cannot convert pyarrow.lib.ChunkedArray to pyarrow.lib.Array [arrow]

Reply via email to