[jira] [Comment Edited] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions

Joris Van den Bossche (Jira) Sat, 07 Sep 2019 10:56:06 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16924959#comment-16924959
 ]


Joris Van den Bossche edited comment on ARROW-2428 at 9/7/19 5:55 PM:
----------------------------------------------------------------------

> It seems like the pandas glue can be part of the Python-side ExtensionType 
> implementation,

Yes, it might also need to interact with the pandas extension type, otherwise 
it would only solve part of the problem. I think it would be nice if it is also 
possible to create a pandas ExtensionArray without having a pyarrow 
ExtensionType (the "case 1" described above). Eg to roundtrip pandas' nullable 
integer type, or a future string dtype (which both don't need an arrow 
ExtensionType).

If we would want to support that, one option could be to go for a 
{{PandasExtensionDtype.\_\_constructor_from_arrow\_\_}} that knows how to 
create an ExtensionArray from an arrow array, and then a 
{{pyarrow.ExtensionType.\_\_pandas_dtype\_\_}} that points to the pandas 
extension dtype to use for this pyarrow ExtensionType


was (Author: jorisvandenbossche):
> It seems like the pandas glue can be part of the Python-side ExtensionType 
> implementation,

Yes, but that might only solve part of the problem. I think it would be nice if 
it is also possible to create a pandas ExtensionArray without having a pyarrow 
ExtensionType (the "case 1" described above). Eg to roundtrip pandas' nullable 
integer type, or a future string dtype (which both don't need an arrow 
ExtensionType).

If we would want to support that, one option could be to go for a 
{{PandasExtensionDtype.__constructor_from_arrow__}} that knows how to create an 
ExtensionArray from an arrow array, and then a 
{{pyarrow.ExtensionType.\_\_pandas_dtype\_\_}} that points to the pandas 
extension dtype to use for this pyarrow ExtensionType

> [Python] Add API to map Arrow types (including extension types) to pandas 
> ExtensionArray instances for to_pandas conversions
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-2428
>                 URL: https://issues.apache.org/jira/browse/ARROW-2428
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>             Fix For: 1.0.0
>
>
> With the next release of Pandas, it will be possible to define custom column 
> types that back a {{pandas.Series}}. Thus we will not be able to cover all 
> possible column types in the {{to_pandas}} conversion by default as we won't 
> be aware of all extension arrays.
> To enable users to create {{ExtensionArray}} instances from Arrow columns in 
> the {{to_pandas}} conversion, we should provide a hook in the {{to_pandas}} 
> call where they can overload the default conversion routines with the ones 
> that produce their {{ExtensionArray}} instances.
> This should avoid additional copies in the case where we would nowadays first 
> convert the Arrow column into a default Pandas column (probably of object 
> type) and the user would afterwards convert it to a more efficient 
> {{ExtensionArray}}. This hook here will be especially useful when you build 
> {{ExtensionArrays}} where the storage is backed by Arrow.
> The meta-issue that tracks the implementation inside of Pandas is: 
> https://github.com/pandas-dev/pandas/issues/19696



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Comment Edited] (ARROW-2428) [Python] Add API to map Arrow types (including extension types) to pandas ExtensionArray instances for to_pandas conversions

Reply via email to