hi John,

The documentation says

    array : pyarrow.Array or pyarrow.ChunkedArray
        A ChunkedArray instead of an Array is returned if:

        - the object data overflowed binary storage.
        - the object's ``__arrow_array__`` protocol method returned a chunked
          array.

Overflowing binary storage means exceeding the 2^31 - 1 bytes limit
for BinaryType or StringType/UTF8. We thought this was better than
failing since the output of pyarrow.array is often used to instantiate
a pyarrow.Table which will not argue with the ChunkedArray.

Depending on your input data you might wager a guess whether the
overflow will occur but it will be application-dependent.

- Wes

On Tue, Dec 3, 2019 at 10:51 AM John Muehlhausen <j...@jgm.org> wrote:
>
> Given input data and a type, how do we predict whether array() will produce
> ChunkedArray?
>
> I figure the formula involves:
> - the length of input
> - the type, and max length (to be conservative) for variable length types
> - some constant(s) that Arrow knows internally... that may change in the
> future?
>
> Should there be an API to make this easy?  Am I missing one that already
> exists?
>
> Thanks,
> John

Reply via email to