jorisvandenbossche opened a new issue, #34755:
URL: https://github.com/apache/arrow/issues/34755

   From https://github.com/apache/arrow/pull/34289#pullrequestreview-1355094099
   
   Currently, the `pyarrow.array(..)` constructor is meant to create Array 
object, but can return a ChunkedArray instead in two cases: 1) the object is 
too big to fit into a single array (eg offset gets too large for single 
StringArray), and 2) the object has a `__arrow_array__` that returns a 
ChunkedArray.
   
   However, if this starts to happen more and more, that can be annoying for 
places in our code where we assume `pyarrow.array(..)` gives us an Array, see 
for example 
https://github.com/apache/arrow/issues/33727#issuecomment-1387323624. For this 
specific case, we updated `pyarrow.array(..)` to special case chunked arrays 
with only 1 chunk to unpack this into a normal Array, since that's an easy 
zero-copy conversion (done in https://github.com/apache/arrow/pull/34289). 
   
   Longer term, what do we want to do with `pyarrow.array(..)` returning 
chunked arrays? 
   
   For example, passing a pandas.Series to `pyarrow.array(..)` can easily give 
a ChunkedArray:
   
   ```python
   >>> arr = pa.chunked_array([[1, 2], [3, 4]])
   >>> ser = pd.Series(arr, dtype=pd.ArrowDtype(arr.type))
   >>> ser
   0    1
   1    2
   2    3
   3    4
   dtype: int64[pyarrow]
   >>> pa.array(ser)
   <pyarrow.lib.ChunkedArray object at 0x7effe3f9ea90>
   [
     [
       1,
       2
     ],
     [
       3,
       4
     ]
   ]
   ```
   
   Some thoughts:
   
   - To ensure you can rely more on `pa.array(..)` to actually return an Array, 
we could concat chunked arrays in the example above, and then users could use 
`pa.asarray(..)` to get either Array or ChunkedArray
   - Keep `pa.array(..)` as flexible giving either Array/ChunkedArray, but add 
other function that is more strict, or a helper that ensures we always have an 
Array and concats chunks if necesssary, which could then be used internally 
where needed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to