jorisvandenbossche commented on code in PR #33802:
URL: https://github.com/apache/arrow/pull/33802#discussion_r1093228385
##########
python/pyarrow/types.pxi:
##########
@@ -968,27 +988,6 @@ cdef class ExtensionType(BaseExtensionType):
"""
return NotImplementedError
- def __arrow_ext_class__(self):
Review Comment:
I am wondering if we would like to keep this here as well. It would give
some duplication, but for someone looking at `ExtensionType` for implementing a
subclass yourself, it's nice that the 4 special dunder methods are all
together, and not spread between this class and the base class.
(to avoid duplication, we can maybe leave out the docstring on the base
class?)
##########
python/pyarrow/tests/test_cython.py:
##########
@@ -163,6 +163,40 @@ def test_cython_api(tmpdir):
env=subprocess_env)
[email protected]
+def test_extension_type(tmpdir):
+ with tmpdir.as_cwd():
+ # Set up temporary workspace
+ pyx_file = 'extensions.pyx'
+ shutil.copyfile(os.path.join(here, pyx_file),
+ os.path.join(str(tmpdir), pyx_file))
+ # Create setup.py file
+ setup_code = setup_template.format(pyx_file=pyx_file,
+ compiler_opts=compiler_opts,
+ test_ld_path=test_ld_path)
+ with open('setup.py', 'w') as f:
+ f.write(setup_code)
+
+ subprocess_env = test_util.get_modified_env_with_pythonpath()
+
+ # Compile extension module
+ subprocess.check_call([sys.executable, 'setup.py',
+ 'build_ext', '--inplace'],
+ env=subprocess_env)
+
+ sys.path.insert(0, str(tmpdir))
+ mod = __import__('extensions')
+
+ uuid_type = mod._make_uuid_type()
+ assert uuid_type.extension_name == "uuid"
+ assert uuid_type.storage_type == pa.binary(16)
+
+ array = mod._make_uuid_array()
Review Comment:
Can you add an assert of that the resulting array is an instance of
BaseExtensionType? (and maybe that the `array.type == uuid_type`)
##########
python/pyarrow/tests/test_cython.py:
##########
@@ -163,6 +163,40 @@ def test_cython_api(tmpdir):
env=subprocess_env)
[email protected]
+def test_extension_type(tmpdir):
Review Comment:
I think it is fine either way. It could also be moved to
`test_extension_type.py`, and then you can import some of the needed utilities
from test_cython.py.
##########
python/pyarrow/types.pxi:
##########
@@ -836,6 +836,26 @@ cdef class BaseExtensionType(DataType):
DataType.init(self, type)
self.ext_type = <const CExtensionType*> type.get()
+ def __arrow_ext_class__(self):
+ """Return an extension array class to be used for building or
+ deserializing arrays with this extension type.
+
+ This method should return a subclass of the ExtensionArray class. By
+ default, if not specialized in the extension implementation, an
+ extension type array will be a built-in ExtensionArray instance.
+ """
+ return ExtensionArray
Review Comment:
> However, this does seem to just work.
Yes, this works as long as you are fine with the base implementation of that
class (eg it's implementation of to_numpy, etc).
Once you want to customize that, you'll need a custom class, but I am not
sure if that's straightforward to do at the moment ("register" such a custom
class for extension type that is actually implemented in C++). In any case, for
this PR using ExtensionArray is fine (since a dummy generated class would not
give anything of additional functionality, just a different name of the class)
##########
python/pyarrow/tests/test_cython.py:
##########
@@ -163,6 +163,40 @@ def test_cython_api(tmpdir):
env=subprocess_env)
[email protected]
+def test_extension_type(tmpdir):
+ with tmpdir.as_cwd():
+ # Set up temporary workspace
+ pyx_file = 'extensions.pyx'
+ shutil.copyfile(os.path.join(here, pyx_file),
+ os.path.join(str(tmpdir), pyx_file))
+ # Create setup.py file
+ setup_code = setup_template.format(pyx_file=pyx_file,
+ compiler_opts=compiler_opts,
+ test_ld_path=test_ld_path)
+ with open('setup.py', 'w') as f:
+ f.write(setup_code)
+
+ subprocess_env = test_util.get_modified_env_with_pythonpath()
+
+ # Compile extension module
+ subprocess.check_call([sys.executable, 'setup.py',
+ 'build_ext', '--inplace'],
+ env=subprocess_env)
+
+ sys.path.insert(0, str(tmpdir))
+ mod = __import__('extensions')
+
+ uuid_type = mod._make_uuid_type()
+ assert uuid_type.extension_name == "uuid"
+ assert uuid_type.storage_type == pa.binary(16)
+
+ array = mod._make_uuid_array()
+ assert array.to_pylist() == [b'abcdefghijklmno0', b'0onmlkjihgfedcba']
+ assert array[0].as_py() == b'abcdefghijklmno0'
+ assert array[1].as_py() == b'0onmlkjihgfedcba'
+
+
Review Comment:
I am not sure it's necessarily needed, but another test you could add is put
this extension array in a RecordBatch, send it through IPC to ensure it's still
the correct extension type afterwards. There are some helpers to make this
easier in `test_extension_type.py` (see the `ipc_write_batch`/`ipc_read_batch`
usage)
This would require exposing an additional function in cython wrapped C++
snippet to register the type.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]