[
https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834100#comment-16834100
]
Joris Van den Bossche commented on ARROW-2428:
----------------------------------------------
[~xhochy] did you have already a specific hook in mind or tried something
specific at the AHL hackathon?
One way might be to allow the user to specify the target dtypes in
{{to_pandas}} (on an optional per column basis). If an ExtensionDtype instance
is passed there, arrow could delegate converting the arrow array to a pandas
ExtensionArray to the ExtensionDtype/Array class itself.
Similarly, if we start storing the name of the ExtensionDtype in the pandas
metadata, we could also automatically re-create the dtype from that name
(without the need for the user to pass it explicitly, for the default).
See also the discussion in https://github.com/pandas-dev/pandas/issues/20612
> [Python] Support ExtensionArrays in to_pandas conversion
> --------------------------------------------------------
>
> Key: ARROW-2428
> URL: https://issues.apache.org/jira/browse/ARROW-2428
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Reporter: Uwe L. Korn
> Priority: Major
> Fix For: 1.0.0
>
>
> With the next release of Pandas, it will be possible to define custom column
> types that back a {{pandas.Series}}. Thus we will not be able to cover all
> possible column types in the {{to_pandas}} conversion by default as we won't
> be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in
> the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}}
> call where they can overload the default conversion routines with the ones
> that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first
> convert the Arrow column into a default Pandas column (probably of object
> type) and the user would afterwards convert it to a more efficient
> {{ExtensionArray}}. This hook here will be especially useful when you build
> {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is:
> https://github.com/pandas-dev/pandas/issues/19696
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)