[
https://issues.apache.org/jira/browse/ARROW-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neal Richardson reassigned ARROW-7066:
--------------------------------------
Assignee: Joris Van den Bossche
> [Python] support returning ChunkedArray from __arrow_array__ ?
> --------------------------------------------------------------
>
> Key: ARROW-7066
> URL: https://issues.apache.org/jira/browse/ARROW-7066
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Joris Van den Bossche
> Assignee: Joris Van den Bossche
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.0.0
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> The {{\_\_arrow_array\_\_}} protocol was added so that custom objects can
> define how they should be converted to a pyarrow Array (similar to numpy's
> {{\_\_array\_\_}}). This is then also used to support converting pandas
> DataFrames with columns using pandas' ExtensionArrays to a pyarrow Table (if
> the pandas ExtensionArray, such as nullable integer type, implements this
> {{\_\_arrow_array\_\_}} method).
> This last use case could also be useful for fletcher
> (https://github.com/xhochy/fletcher/, a package that implements pandas
> ExtensionArrays that wrap pyarrow arrays, so they can be stored as is in a
> pandas DataFrame).
> However, fletcher stores ChunkedArrays in ExtensionArry / the columns of a
> pandas DataFrame (to have a better mapping with a Table, where the columns
> also consist of chunked arrays). While we currently require that the return
> value of {{\_\_arrow_array\_\_}} is a pyarrow.Array.
> So I was wondering: could we relax this constraint and also allow
> ChunkedArray as return value?
> However, this protocol is currently called in the {{pa.array(..)}} function,
> which probably should keep returning an Array (and not ChunkedArray in
> certain cases).
> cc [~uwe]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)