jorisvandenbossche commented on code in PR #44070:
URL: https://github.com/apache/arrow/pull/44070#discussion_r1756267334
##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1926,3 +1926,31 @@ def test_bool8_scalar():
assert pa.scalar(1, type=pa.bool8()).as_py() is True
assert pa.scalar(2, type=pa.bool8()).as_py() is True
assert pa.scalar(None, type=pa.bool8()).as_py() is None
+
+
[email protected]("str_type", (pa.utf8, pa.large_utf8))
Review Comment:
The test doesn't work yet for `pa.utf8_view()`?
##########
python/pyarrow/types.pxi:
##########
@@ -5236,6 +5273,43 @@ def run_end_encoded(run_end_type, value_type):
return pyarrow_wrap_data_type(ree_type)
+def json(DataType storage_type):
Review Comment:
Use a default of utf8 ?
##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1926,3 +1926,31 @@ def test_bool8_scalar():
assert pa.scalar(1, type=pa.bool8()).as_py() is True
assert pa.scalar(2, type=pa.bool8()).as_py() is True
assert pa.scalar(None, type=pa.bool8()).as_py() is None
+
+
[email protected]("str_type", (pa.utf8, pa.large_utf8))
+def test_json(str_type):
+ storage_type = str_type()
+ data = ['{"a": 1}', '{"b": 2}', None]
+ storage = pa.array(data, type=storage_type)
+ json_type = pa.json(storage_type)
+
+ assert json_type.extension_name == "arrow.json"
+ assert json_type.storage_type == storage_type
+ assert json_type.__class__ is pa.JsonType
+
+ array = pa.ExtensionArray.from_storage(json_type, storage)
+
+ assert array.to_pylist() == data
+ assert array[0].as_py() == data[0]
+ assert array[2].as_py() is None
+
+ buf = ipc_write_batch(pa.RecordBatch.from_arrays([array], ["json"]))
Review Comment:
Can you also add a pickle roundtrip of the type? (see eg test_opaque_type
above)
##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1926,3 +1926,31 @@ def test_bool8_scalar():
assert pa.scalar(1, type=pa.bool8()).as_py() is True
assert pa.scalar(2, type=pa.bool8()).as_py() is True
assert pa.scalar(None, type=pa.bool8()).as_py() is None
+
+
[email protected]("str_type", (pa.utf8, pa.large_utf8))
+def test_json(str_type):
+ storage_type = str_type()
+ data = ['{"a": 1}', '{"b": 2}', None]
+ storage = pa.array(data, type=storage_type)
+ json_type = pa.json(storage_type)
+
+ assert json_type.extension_name == "arrow.json"
+ assert json_type.storage_type == storage_type
+ assert json_type.__class__ is pa.JsonType
Review Comment:
Can you also check equality at the type instance level?
```suggestion
assert json_type.__class__ is pa.JsonType
assert json_type == pa.json(storage_type)
assert json_type != storage_type
```
Which also makes me wonder: if the storage type is different (eg string vs
large string), does it then return False? (maybe also good to test that)
##########
cpp/src/arrow/extension/json.h:
##########
@@ -27,6 +27,11 @@
namespace arrow::extension {
+class ARROW_EXPORT JsonArray : public ExtensionArray {
+ public:
+ using ExtensionArray::ExtensionArray;
+};
Review Comment:
This is not used in C++, but I think also not actually needed for the Python
bindings?
##########
python/pyarrow/array.pxi:
##########
@@ -4343,6 +4343,30 @@ cdef class ExtensionArray(Array):
return result
+class JsonArray(ExtensionArray):
+ """
+ Concrete class for Arrow arrays of JSON data type.
+
Review Comment:
Should we maybe note here that this array class does not guarantee that the
data is actually valid JSON? (which I assume is the case?)
For example in the code example below, you can in theory put whatever string
data you want in the array.
##########
python/pyarrow/lib.pxd:
##########
@@ -226,6 +226,10 @@ cdef class UuidType(BaseExtensionType):
cdef:
const CUuidType* uuid_ext_type
+cdef class JsonType(BaseExtensionType):
+ cdef:
+ const CJsonType* json_ext_type
+
Review Comment:
```suggestion
```
##########
python/pyarrow/tests/test_extension_type.py:
##########
@@ -1926,3 +1926,31 @@ def test_bool8_scalar():
assert pa.scalar(1, type=pa.bool8()).as_py() is True
assert pa.scalar(2, type=pa.bool8()).as_py() is True
assert pa.scalar(None, type=pa.bool8()).as_py() is None
+
+
[email protected]("str_type", (pa.utf8, pa.large_utf8))
+def test_json(str_type):
+ storage_type = str_type()
+ data = ['{"a": 1}', '{"b": 2}', None]
+ storage = pa.array(data, type=storage_type)
+ json_type = pa.json(storage_type)
+
+ assert json_type.extension_name == "arrow.json"
+ assert json_type.storage_type == storage_type
+ assert json_type.__class__ is pa.JsonType
+
+ array = pa.ExtensionArray.from_storage(json_type, storage)
+
+ assert array.to_pylist() == data
+ assert array[0].as_py() == data[0]
+ assert array[2].as_py() is None
+
+ buf = ipc_write_batch(pa.RecordBatch.from_arrays([array], ["json"]))
Review Comment:
And maybe also a basic cast (again see the structure of `test_opaque_type`
and `test_bool8_type` above, you can follow that I think)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]