[ https://issues.apache.org/jira/browse/ARROW-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972155#comment-16972155 ]
Joris Van den Bossche commented on ARROW-7066: ---------------------------------------------- I still don't fully like returning a chunked array from {{pa.array}}, but also don't see an easy other solution to otherwise get the roundtrip working for eg fletcher that uses chunked arrays (alternative would be to have an "internal" version of {{pa.array(..)}} that allows this, and keep the public one strict, but that is also rather ugly). I will add some documentation update to the current open PR. > [Python] support returning ChunkedArray from __arrow_array__ ? > -------------------------------------------------------------- > > Key: ARROW-7066 > URL: https://issues.apache.org/jira/browse/ARROW-7066 > Project: Apache Arrow > Issue Type: Improvement > Components: Python > Reporter: Joris Van den Bossche > Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The {{\_\_arrow_array\_\_}} protocol was added so that custom objects can > define how they should be converted to a pyarrow Array (similar to numpy's > {{\_\_array\_\_}}). This is then also used to support converting pandas > DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if > the pandas ExtensionArray, such as nullable integer type, implements this > {{\_\_arrow_array\_\_}} method). > This last use case could also be useful for fletcher > (https://github.com/xhochy/fletcher/, a package that implements pandas > ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a > pandas DataFrame). > However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a > pandas DataFrame (to have a better mapping with a Table, where the columns > also consist of chunked arrays). While we currently require that the return > value of {{\_\_arrow_array\_\_}} is a pyarrow.Array. > So I was wondering: could we relax this constraint and also allow > ChunkedArray as return value? > However, this protocol is currently called in the {{pa.array(..)}} function, > which probably should keep returning an Array (and not ChunkedArray in > certain cases). > cc [~uwe] -- This message was sent by Atlassian Jira (v8.3.4#803005)