[
https://issues.apache.org/jira/browse/ARROW-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443776#comment-16443776
]
Antoine Pitrou commented on ARROW-2476:
---------------------------------------
{quote}Are the array lengths `arrow format` specific or implementation
specific?{quote}
I don't know. But array length is not part of array data, only metadata, so I
guess implementations are free to represent it as they prefer?
{quote}The [list offset
bulletpoint|https://github.com/apache/arrow/blob/master/format/Layout.md#list-type]
in the format is also misleading.{quote}
Can you explain what looks misleading to you?
> [Python/Question] Maximum length of an Array created from ndarray
> -----------------------------------------------------------------
>
> Key: ARROW-2476
> URL: https://issues.apache.org/jira/browse/ARROW-2476
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Krisztian Szucs
> Priority: Minor
>
> So the format
> [describes|https://github.com/apache/arrow/blob/master/format/Layout.md#array-lengths]
> that an array max length is 2^31 - 1, however the following python snippet
> creates a 2**32 length arrow array:
> {code:python}
> a = np.ones((2**32,), dtype='int8')
> A = pa.Array.from_pandas(a)
> type(A)
> {code}
> {code}pyarrow.lib.Int8Array{code}
> Based the layout specification I'd expect a ChunkedArray of three Int8Array's
> with lengths:
> [2^31 - 1, 2^31 - 1, 2]
> If it's the expectation is there any documentation for it?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)