jorisvandenbossche commented on code in PR #34957:
URL: https://github.com/apache/arrow/pull/34957#discussion_r1163696817
##########
docs/source/python/extending_types.rst:
##########
@@ -357,3 +357,143 @@ pandas ``ExtensionArray``. This method should have the
following signature::
This way, you can control the conversion of a pyarrow ``Array`` of your pyarrow
extension type to a pandas ``ExtensionArray`` that can be stored in a
DataFrame.
+
+
+Canonical extension types
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can find the official list of canonical extension types in the
+:ref:`format_canonical_extensions` section. Here we add examples on how to
+use them in pyarrow.
+
+Fixed size tensor
+"""""""""""""""""
+
+To create an array of tensors with equal shape (fixed shape tensor array) we
+first need to define a fixed shape tensor extension type with value type
+and shape:
+
+.. code-block:: python
+
+ >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
+
+Then we need the storage array with :func:`pyarrow.list_` type where
``value_type```
+is the fixed shape tensor value type and list size is a product of
``tensor_type``
+shape elements. Then we can create an array of tensors with
+``pa.ExtensionArray.from_storage()`` method:
+
+.. code-block:: python
+
+ >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+ >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+ >>> tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
+
+We can also create another array of tensors with different value type:
+
+.. code-block:: python
+
+ >>> tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2, 2))
+ >>> storage_2 = pa.array(arr, pa.list_(pa.float32(), 4))
+ >>> tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2,
storage_2)
+
+Extension arrays can be used as columns in ``pyarrow.Table`` or
+``pyarrow.RecordBatch``:
+
+.. code-block:: python
+
+ >>> data = [
+ ... pa.array([1, 2, 3]),
+ ... pa.array(['foo', 'bar', None]),
+ ... pa.array([True, None, True]),
+ ... tensor_array,
+ ... tensor_array_2
+ ... ]
+ >>> my_schema = pa.schema([('f0', pa.int8()),
+ ... ('f1', pa.string()),
+ ... ('f2', pa.bool_()),
+ ... ('tensors_int', tensor_type),
+ ... ('tensors_float', tensor_type_2)])
+ >>> table = pa.Table.from_arrays(data, schema=my_schema)
+ >>> table
+ pyarrow.Table
+ f0: int8
+ f1: string
+ f2: bool
+ tensors_int: extension<arrow.fixed_size_tensor>
+ tensors_float: extension<arrow.fixed_size_tensor>
+ ----
+ f0: [[1,2,3]]
+ f1: [["foo","bar",null]]
+ f2: [[true,null,true]]
+ tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+ tensors_float: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+We can also convert a tensor array to a numpy ndarray:
Review Comment:
```suggestion
We can also convert a tensor array to a single multi-dimensional numpy
ndarray:
```
(to contrast it with the 1D result of `to_numpy()`)
##########
docs/source/python/extending_types.rst:
##########
@@ -357,3 +357,143 @@ pandas ``ExtensionArray``. This method should have the
following signature::
This way, you can control the conversion of a pyarrow ``Array`` of your pyarrow
extension type to a pandas ``ExtensionArray`` that can be stored in a
DataFrame.
+
+
+Canonical extension types
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can find the official list of canonical extension types in the
+:ref:`format_canonical_extensions` section. Here we add examples on how to
+use them in pyarrow.
+
+Fixed size tensor
+"""""""""""""""""
+
+To create an array of tensors with equal shape (fixed shape tensor array) we
+first need to define a fixed shape tensor extension type with value type
+and shape:
+
+.. code-block:: python
+
+ >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
+
+Then we need the storage array with :func:`pyarrow.list_` type where
``value_type```
+is the fixed shape tensor value type and list size is a product of
``tensor_type``
+shape elements. Then we can create an array of tensors with
+``pa.ExtensionArray.from_storage()`` method:
+
+.. code-block:: python
+
+ >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+ >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+ >>> tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
+
+We can also create another array of tensors with different value type:
+
+.. code-block:: python
+
+ >>> tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2, 2))
+ >>> storage_2 = pa.array(arr, pa.list_(pa.float32(), 4))
+ >>> tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2,
storage_2)
+
+Extension arrays can be used as columns in ``pyarrow.Table`` or
+``pyarrow.RecordBatch``:
+
+.. code-block:: python
+
+ >>> data = [
+ ... pa.array([1, 2, 3]),
+ ... pa.array(['foo', 'bar', None]),
+ ... pa.array([True, None, True]),
+ ... tensor_array,
+ ... tensor_array_2
+ ... ]
+ >>> my_schema = pa.schema([('f0', pa.int8()),
+ ... ('f1', pa.string()),
+ ... ('f2', pa.bool_()),
+ ... ('tensors_int', tensor_type),
+ ... ('tensors_float', tensor_type_2)])
+ >>> table = pa.Table.from_arrays(data, schema=my_schema)
+ >>> table
+ pyarrow.Table
+ f0: int8
+ f1: string
+ f2: bool
+ tensors_int: extension<arrow.fixed_size_tensor>
+ tensors_float: extension<arrow.fixed_size_tensor>
+ ----
+ f0: [[1,2,3]]
+ f1: [["foo","bar",null]]
+ f2: [[true,null,true]]
+ tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+ tensors_float: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+We can also convert a tensor array to a numpy ndarray:
+
+.. code-block:: python
+
+ >>> numpy_tensor = tensor_array_2.to_numpy_ndarray()
+ >>> numpy_tensor
+ array([[[ 1., 2.],
+ [ 3., 4.]],
+ [[ 10., 20.],
+ [ 30., 40.]],
+ [[100., 200.],
+ [300., 400.]]])
+
+.. note::
+
+ Both optional parameters, ``permutation`` and ``dim_names``, are meant to
provide the user
+ with the information about the logical layout of the data compared to the
physical layout.
+
+ The conversion to numpy ndarray is only possible for trivial permutations
(``None`` or
+ ``[0, 1, ... N-1]`` where ``N`` is the number of tensor dimensions).
+
+And also the other way around, we can convert a list of numpy ndarrays to a
fixed shape tensor
Review Comment:
And I would maybe say something about the first dimension of the ndarray
becoming the length of the extension array (and maybe give an example that
ndarray of shape (3, 2, 2) becomes an arrow array of length 3 with tensor
elements of shape (2, 2))
##########
docs/source/python/extending_types.rst:
##########
@@ -357,3 +357,143 @@ pandas ``ExtensionArray``. This method should have the
following signature::
This way, you can control the conversion of a pyarrow ``Array`` of your pyarrow
extension type to a pandas ``ExtensionArray`` that can be stored in a
DataFrame.
+
+
+Canonical extension types
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can find the official list of canonical extension types in the
+:ref:`format_canonical_extensions` section. Here we add examples on how to
+use them in pyarrow.
+
+Fixed size tensor
+"""""""""""""""""
+
+To create an array of tensors with equal shape (fixed shape tensor array) we
+first need to define a fixed shape tensor extension type with value type
+and shape:
+
+.. code-block:: python
+
+ >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
+
+Then we need the storage array with :func:`pyarrow.list_` type where
``value_type```
+is the fixed shape tensor value type and list size is a product of
``tensor_type``
+shape elements. Then we can create an array of tensors with
+``pa.ExtensionArray.from_storage()`` method:
+
+.. code-block:: python
+
+ >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+ >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+ >>> tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
+
+We can also create another array of tensors with different value type:
+
+.. code-block:: python
+
+ >>> tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2, 2))
+ >>> storage_2 = pa.array(arr, pa.list_(pa.float32(), 4))
+ >>> tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2,
storage_2)
+
+Extension arrays can be used as columns in ``pyarrow.Table`` or
+``pyarrow.RecordBatch``:
+
+.. code-block:: python
+
+ >>> data = [
+ ... pa.array([1, 2, 3]),
+ ... pa.array(['foo', 'bar', None]),
+ ... pa.array([True, None, True]),
+ ... tensor_array,
+ ... tensor_array_2
+ ... ]
+ >>> my_schema = pa.schema([('f0', pa.int8()),
+ ... ('f1', pa.string()),
+ ... ('f2', pa.bool_()),
+ ... ('tensors_int', tensor_type),
+ ... ('tensors_float', tensor_type_2)])
+ >>> table = pa.Table.from_arrays(data, schema=my_schema)
+ >>> table
+ pyarrow.Table
+ f0: int8
+ f1: string
+ f2: bool
+ tensors_int: extension<arrow.fixed_size_tensor>
+ tensors_float: extension<arrow.fixed_size_tensor>
+ ----
+ f0: [[1,2,3]]
+ f1: [["foo","bar",null]]
+ f2: [[true,null,true]]
+ tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+ tensors_float: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+We can also convert a tensor array to a numpy ndarray:
+
+.. code-block:: python
+
+ >>> numpy_tensor = tensor_array_2.to_numpy_ndarray()
+ >>> numpy_tensor
+ array([[[ 1., 2.],
+ [ 3., 4.]],
+ [[ 10., 20.],
+ [ 30., 40.]],
+ [[100., 200.],
+ [300., 400.]]])
+
+.. note::
+
+ Both optional parameters, ``permutation`` and ``dim_names``, are meant to
provide the user
+ with the information about the logical layout of the data compared to the
physical layout.
+
+ The conversion to numpy ndarray is only possible for trivial permutations
(``None`` or
+ ``[0, 1, ... N-1]`` where ``N`` is the number of tensor dimensions).
+
+And also the other way around, we can convert a list of numpy ndarrays to a
fixed shape tensor
Review Comment:
```suggestion
And also the other way around, we can convert a numpy ndarray to a fixed
shape tensor
```
##########
docs/source/python/extending_types.rst:
##########
@@ -357,3 +357,143 @@ pandas ``ExtensionArray``. This method should have the
following signature::
This way, you can control the conversion of a pyarrow ``Array`` of your pyarrow
extension type to a pandas ``ExtensionArray`` that can be stored in a
DataFrame.
+
+
+Canonical extension types
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can find the official list of canonical extension types in the
+:ref:`format_canonical_extensions` section. Here we add examples on how to
+use them in pyarrow.
+
+Fixed size tensor
+"""""""""""""""""
+
+To create an array of tensors with equal shape (fixed shape tensor array) we
+first need to define a fixed shape tensor extension type with value type
+and shape:
+
+.. code-block:: python
+
+ >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
+
+Then we need the storage array with :func:`pyarrow.list_` type where
``value_type```
+is the fixed shape tensor value type and list size is a product of
``tensor_type``
+shape elements. Then we can create an array of tensors with
+``pa.ExtensionArray.from_storage()`` method:
+
+.. code-block:: python
+
+ >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+ >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+ >>> tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
+
+We can also create another array of tensors with different value type:
+
+.. code-block:: python
+
+ >>> tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2, 2))
+ >>> storage_2 = pa.array(arr, pa.list_(pa.float32(), 4))
+ >>> tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2,
storage_2)
+
+Extension arrays can be used as columns in ``pyarrow.Table`` or
+``pyarrow.RecordBatch``:
+
+.. code-block:: python
+
+ >>> data = [
+ ... pa.array([1, 2, 3]),
+ ... pa.array(['foo', 'bar', None]),
+ ... pa.array([True, None, True]),
+ ... tensor_array,
+ ... tensor_array_2
+ ... ]
+ >>> my_schema = pa.schema([('f0', pa.int8()),
+ ... ('f1', pa.string()),
+ ... ('f2', pa.bool_()),
+ ... ('tensors_int', tensor_type),
+ ... ('tensors_float', tensor_type_2)])
+ >>> table = pa.Table.from_arrays(data, schema=my_schema)
+ >>> table
+ pyarrow.Table
+ f0: int8
+ f1: string
+ f2: bool
+ tensors_int: extension<arrow.fixed_size_tensor>
+ tensors_float: extension<arrow.fixed_size_tensor>
+ ----
+ f0: [[1,2,3]]
+ f1: [["foo","bar",null]]
+ f2: [[true,null,true]]
+ tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+ tensors_float: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+We can also convert a tensor array to a numpy ndarray:
+
+.. code-block:: python
+
+ >>> numpy_tensor = tensor_array_2.to_numpy_ndarray()
+ >>> numpy_tensor
+ array([[[ 1., 2.],
+ [ 3., 4.]],
+ [[ 10., 20.],
Review Comment:
I think the alignment is a bit off here. If I copy past this from a console,
it looks like:
```
In [27]: tensor_array_2.to_numpy_ndarray()
Out[27]:
array([[[ 1., 2.],
[ 3., 4.]],
[[ 10., 20.],
[ 30., 40.]],
[[100., 200.],
[300., 400.]]], dtype=float32)
```
So the square brackets are vertically aligned to better show the multiple
dimensions. So would try to do that exactly the same here as well
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]