jorisvandenbossche commented on code in PR #33802:
URL: https://github.com/apache/arrow/pull/33802#discussion_r1093228385


##########
python/pyarrow/types.pxi:
##########
@@ -968,27 +988,6 @@ cdef class ExtensionType(BaseExtensionType):
         """
         return NotImplementedError
 
-    def __arrow_ext_class__(self):

Review Comment:
   I am wondering if we would like to keep this here as well. It would give 
some duplication, but for someone looking at `ExtensionType` for implementing a 
subclass yourself, it's nice that the 4 special dunder methods are all 
together, and not spread between this class and the base class. 
   
   (to avoid duplication, we can maybe leave out the docstring on the base 
class?)



##########
python/pyarrow/tests/test_cython.py:
##########
@@ -163,6 +163,40 @@ def test_cython_api(tmpdir):
                               env=subprocess_env)
 
 
[email protected]
+def test_extension_type(tmpdir):
+    with tmpdir.as_cwd():
+        # Set up temporary workspace
+        pyx_file = 'extensions.pyx'
+        shutil.copyfile(os.path.join(here, pyx_file),
+                        os.path.join(str(tmpdir), pyx_file))
+        # Create setup.py file
+        setup_code = setup_template.format(pyx_file=pyx_file,
+                                           compiler_opts=compiler_opts,
+                                           test_ld_path=test_ld_path)
+        with open('setup.py', 'w') as f:
+            f.write(setup_code)
+
+        subprocess_env = test_util.get_modified_env_with_pythonpath()
+
+        # Compile extension module
+        subprocess.check_call([sys.executable, 'setup.py',
+                               'build_ext', '--inplace'],
+                              env=subprocess_env)
+
+    sys.path.insert(0, str(tmpdir))
+    mod = __import__('extensions')
+
+    uuid_type = mod._make_uuid_type()
+    assert uuid_type.extension_name == "uuid"
+    assert uuid_type.storage_type == pa.binary(16)
+
+    array = mod._make_uuid_array()

Review Comment:
   Can you add an assert of that the resulting array is an instance of 
BaseExtensionType? (and maybe that the `array.type == uuid_type`)



##########
python/pyarrow/tests/test_cython.py:
##########
@@ -163,6 +163,40 @@ def test_cython_api(tmpdir):
                               env=subprocess_env)
 
 
[email protected]
+def test_extension_type(tmpdir):

Review Comment:
   I think it is fine either way. It could also be moved to 
`test_extension_type.py`, and then you can import some of the needed utilities 
from test_cython.py.



##########
python/pyarrow/types.pxi:
##########
@@ -836,6 +836,26 @@ cdef class BaseExtensionType(DataType):
         DataType.init(self, type)
         self.ext_type = <const CExtensionType*> type.get()
 
+    def __arrow_ext_class__(self):
+        """Return an extension array class to be used for building or
+        deserializing arrays with this extension type.
+
+        This method should return a subclass of the ExtensionArray class. By
+        default, if not specialized in the extension implementation, an
+        extension type array will be a built-in ExtensionArray instance.
+        """
+        return ExtensionArray

Review Comment:
   > However, this does seem to just work.
   
   Yes, this works as long as you are fine with the base implementation of that 
class (eg it's implementation of to_numpy, etc). 
   Once you want to customize that, you'll need a custom class, but I am not 
sure if that's straightforward to do at the moment ("register" such a custom 
class for extension type that is actually implemented in C++). In any case, for 
this PR using ExtensionArray is fine (since a dummy generated class would not 
give anything of additional functionality, just a different name of the class)
   
   



##########
python/pyarrow/tests/test_cython.py:
##########
@@ -163,6 +163,40 @@ def test_cython_api(tmpdir):
                               env=subprocess_env)
 
 
[email protected]
+def test_extension_type(tmpdir):
+    with tmpdir.as_cwd():
+        # Set up temporary workspace
+        pyx_file = 'extensions.pyx'
+        shutil.copyfile(os.path.join(here, pyx_file),
+                        os.path.join(str(tmpdir), pyx_file))
+        # Create setup.py file
+        setup_code = setup_template.format(pyx_file=pyx_file,
+                                           compiler_opts=compiler_opts,
+                                           test_ld_path=test_ld_path)
+        with open('setup.py', 'w') as f:
+            f.write(setup_code)
+
+        subprocess_env = test_util.get_modified_env_with_pythonpath()
+
+        # Compile extension module
+        subprocess.check_call([sys.executable, 'setup.py',
+                               'build_ext', '--inplace'],
+                              env=subprocess_env)
+
+    sys.path.insert(0, str(tmpdir))
+    mod = __import__('extensions')
+
+    uuid_type = mod._make_uuid_type()
+    assert uuid_type.extension_name == "uuid"
+    assert uuid_type.storage_type == pa.binary(16)
+
+    array = mod._make_uuid_array()
+    assert array.to_pylist() == [b'abcdefghijklmno0', b'0onmlkjihgfedcba']
+    assert array[0].as_py() == b'abcdefghijklmno0'
+    assert array[1].as_py() == b'0onmlkjihgfedcba'
+
+

Review Comment:
   I am not sure it's necessarily needed, but another test you could add is put 
this extension array in a RecordBatch, send it through IPC to ensure it's still 
the correct extension type afterwards. There are some helpers to make this 
easier in `test_extension_type.py` (see the `ipc_write_batch`/`ipc_read_batch` 
usage)
   
   This would require exposing an additional function in cython wrapped C++ 
snippet to register the type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to