Dewey Dunnington created ARROW-16688:
----------------------------------------

             Summary: [R][Python] Extension types cannot be registered in both 
R and Python
                 Key: ARROW-16688
                 URL: https://issues.apache.org/jira/browse/ARROW-16688
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python, R
            Reporter: Dewey Dunnington


When registering extension types as is now possible in the R bindings, it looks 
as though we cannot register an extension type in R and Python at the same time:

{code:R}
# apache/arrow@master
library(arrow, warn.conflicts = FALSE)
library(reticulate)

# this is a virtualenv with pyarrow installed against the same commit
use_virtualenv(
  "/Users/deweydunnington/Desktop/rscratch/pyarrow-dev",
  required = TRUE
)

pa <- import("pyarrow")
pa[["__version__"]]
#> [1] "9.0.0.dev131+g8a36f0f6c"

py_run_string("
import pyarrow as pa

class TestExtensionType(pa.ExtensionType):
    
    def __init__(self):
        super().__init__(pa.int32(), 'arrow.test_type')
    
    def __arrow_ext_serialize__(self):
        return b''

    @classmethod
    def __arrow_ext_deserialize__(cls, storage_type, serialized):
        return cls()


pa.register_extension_type(TestExtensionType())
")

arrow::register_extension_type(
  arrow::new_extension_type(int32(), "arrow.test_type")
)
#> Error: Key error: A type extension with name arrow.test_type already defined
{code}

I also get a segfault if I try to surface a Python type into R (probably 
because the R bindings mistakenly assume that if {{type.id() == 
Type::EXTENSION}} then it is safe to cast to our own {{ExtensionType}} C++ 
subclass that implements R-specific things.

This came about because the 'geoarrow' Python and 'geoarrow' R packages both 
register a number of extension type definitions.

- geoarrow's Python registration: 
https://github.com/jorisvandenbossche/python-geoarrow/blob/main/src/geoarrow/extension_types.py#L108-L117
- geoarrow's R registration: 
https://github.com/paleolimbot/geoarrow/blob/master/R/pkg-arrow.R#L208-L223

I can also force an interaction if I build GDAL against the same Arrow that the 
arrow R package is linked against and attempt to load a Feather file saved with 
an extension type using the sf package. I will attempt to recreate that 
interaction as well in both R and Python.

I don't know enough about linking to know to what extent this is linked to my 
own development setup/build of the R package, although I think there are at 
least some environments where a shared library is picked up first by the R 
config script (fedora36, for example). It does look like my own R package build 
is dynamically linking to libarrow.dylib.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to