jorisvandenbossche opened a new issue, #34755: URL: https://github.com/apache/arrow/issues/34755
From https://github.com/apache/arrow/pull/34289#pullrequestreview-1355094099 Currently, the `pyarrow.array(..)` constructor is meant to create Array object, but can return a ChunkedArray instead in two cases: 1) the object is too big to fit into a single array (eg offset gets too large for single StringArray), and 2) the object has a `__arrow_array__` that returns a ChunkedArray. However, if this starts to happen more and more, that can be annoying for places in our code where we assume `pyarrow.array(..)` gives us an Array, see for example https://github.com/apache/arrow/issues/33727#issuecomment-1387323624. For this specific case, we updated `pyarrow.array(..)` to special case chunked arrays with only 1 chunk to unpack this into a normal Array, since that's an easy zero-copy conversion (done in https://github.com/apache/arrow/pull/34289). Longer term, what do we want to do with `pyarrow.array(..)` returning chunked arrays? For example, passing a pandas.Series to `pyarrow.array(..)` can easily give a ChunkedArray: ```python >>> arr = pa.chunked_array([[1, 2], [3, 4]]) >>> ser = pd.Series(arr, dtype=pd.ArrowDtype(arr.type)) >>> ser 0 1 1 2 2 3 3 4 dtype: int64[pyarrow] >>> pa.array(ser) <pyarrow.lib.ChunkedArray object at 0x7effe3f9ea90> [ [ 1, 2 ], [ 3, 4 ] ] ``` Some thoughts: - To ensure you can rely more on `pa.array(..)` to actually return an Array, we could concat chunked arrays in the example above, and then users could use `pa.asarray(..)` to get either Array or ChunkedArray - Keep `pa.array(..)` as flexible giving either Array/ChunkedArray, but add other function that is more strict, or a helper that ensures we always have an Array and concats chunks if necesssary, which could then be used internally where needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
