sjperkins commented on code in PR #33802:
URL: https://github.com/apache/arrow/pull/33802#discussion_r1102682069
##########
python/pyarrow/tests/test_cython.py:
##########
@@ -163,6 +163,40 @@ def test_cython_api(tmpdir):
env=subprocess_env)
[email protected]
+def test_extension_type(tmpdir):
+ with tmpdir.as_cwd():
+ # Set up temporary workspace
+ pyx_file = 'extensions.pyx'
+ shutil.copyfile(os.path.join(here, pyx_file),
+ os.path.join(str(tmpdir), pyx_file))
+ # Create setup.py file
+ setup_code = setup_template.format(pyx_file=pyx_file,
+ compiler_opts=compiler_opts,
+ test_ld_path=test_ld_path)
+ with open('setup.py', 'w') as f:
+ f.write(setup_code)
+
+ subprocess_env = test_util.get_modified_env_with_pythonpath()
+
+ # Compile extension module
+ subprocess.check_call([sys.executable, 'setup.py',
+ 'build_ext', '--inplace'],
+ env=subprocess_env)
+
+ sys.path.insert(0, str(tmpdir))
+ mod = __import__('extensions')
+
+ uuid_type = mod._make_uuid_type()
+ assert uuid_type.extension_name == "uuid"
+ assert uuid_type.storage_type == pa.binary(16)
+
+ array = mod._make_uuid_array()
+ assert array.to_pylist() == [b'abcdefghijklmno0', b'0onmlkjihgfedcba']
+ assert array[0].as_py() == b'abcdefghijklmno0'
+ assert array[1].as_py() == b'0onmlkjihgfedcba'
+
+
Review Comment:
> I am not sure it's necessarily needed, but another test you could add is
put this extension array in a RecordBatch, send it through IPC to ensure it's
still the correct extension type afterwards. There are some helpers to make
this easier in `test_extension_type.py` (see the
`ipc_write_batch`/`ipc_read_batch` usage)
Added an IPC serialisation test here, which does fail. I'll discuss further
down.
> This would require exposing an additional function in cython wrapped C++
snippet to register the type.
Which additional function is required here? I wonder if this is possible at
all?
The reason for this is that default implementations for the
`__arrow_ext_class__` and `__arrow_ext_scalar__` methods are provided on
`BaseExtensionType`, but not the `__arrow_ext_serialize__` and
`__arrow_ext_deserialize__` methods. These would ideally call into the C++
`Serialize/Deserialize` methods but its not immediately clear how this can be
accomplished at the moment without implementing
https://github.com/apache/arrow/issues/33997 in some form.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]