[ 
https://issues.apache.org/jira/browse/ARROW-15471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane resolved ARROW-15471.
------------------------------------
    Fix Version/s: 8.0.0
       Resolution: Fixed

Issue resolved by pull request 12467
[https://github.com/apache/arrow/pull/12467]

> [R] ExtensionType support in R
> ------------------------------
>
>                 Key: ARROW-15471
>                 URL: https://issues.apache.org/jira/browse/ARROW-15471
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dewey Dunnington
>            Assignee: Dewey Dunnington
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> In Python there is support for extension types that consists of a 
> registration step that defines functions to handle metadata serialization and 
> deserialization. In R, any extension name or metadata at the top level is 
> currently obliterated on import. To implement geometry reading and writing to 
> Parquet, IPC, and/or Feather, we will need to at the very least have the 
> extension name and metadata preserved (in R), and at best provide a 
> registration step to customize the behaviour of the resulting Array/DataType.
> Reprex for R:
> {code:R}
> # remotes::install_github("paleolimbot/narrow")
> library(narrow)
> carray <- as_narrow_array(1:5)
> carray$schema$metadata[["ARROW:extension:name"]] <- "extension name!"
> carray$schema$metadata[["ARROW:extension:metadata"]] <- "bananas"
> carray$schema$metadata[["something else"]] <- "more bananas"
> array <- from_narrow_array(carray, arrow::Array)
> carray2 <- as_narrow_array(array)
> carray2$schema$metadata[["ARROW:extension:name"]]
> #> NULL
> carray2$schema$metadata[["ARROW:extension:metadata"]]
> #> NULL
> carray2$schema$metadata[["something else"]]
> #> NULL
> {code}
> There is some discussion of that as a solution to ARROW-14378, including an 
> example of how pandas implements the 'interval' extension type (example 
> contributed by [~jorisvandenbossche]).
> For the Interval example, there are some different parts living in different 
> places:
> - The Arrow Extension Type definition for pandas' interval type: 
> https://github.com/pandas-dev/pandas/blob/fc6b441ba527ca32b460ae4f4f5a6802335497f9/pandas/core/arrays/_arrow_utils.py#L88-L136
> - The __from_arrow__ implementation (doing the conversion to arrow): 
> https://github.com/pandas-dev/pandas/blob/fc6b441ba527ca32b460ae4f4f5a6802335497f9/pandas/core/arrays/interval.py#L1405-L1455
> - The __from_arrow__ implementation (conversion arrow -> pandas): 
> https://github.com/pandas-dev/pandas/blob/fc6b441ba527ca32b460ae4f4f5a6802335497f9/pandas/core/dtypes/dtypes.py#L1227-L1255



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to