jorisvandenbossche commented on code in PR #34957:
URL: https://github.com/apache/arrow/pull/34957#discussion_r1163696817


##########
docs/source/python/extending_types.rst:
##########
@@ -357,3 +357,143 @@ pandas ``ExtensionArray``. This method should have the 
following signature::
 
 This way, you can control the conversion of a pyarrow ``Array`` of your pyarrow
 extension type to a pandas ``ExtensionArray`` that can be stored in a 
DataFrame.
+
+
+Canonical extension types
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can find the official list of canonical extension types in the
+:ref:`format_canonical_extensions` section. Here we add examples on how to
+use them in pyarrow.
+
+Fixed size tensor
+"""""""""""""""""
+
+To create an array of tensors with equal shape (fixed shape tensor array) we
+first need to define a fixed shape tensor extension type with value type
+and shape:
+
+.. code-block:: python
+
+   >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
+
+Then we need the storage array with :func:`pyarrow.list_` type where 
``value_type```
+is the fixed shape tensor value type and list size is a product of 
``tensor_type``
+shape elements. Then we can create an array of tensors with
+``pa.ExtensionArray.from_storage()`` method:
+
+.. code-block:: python
+
+   >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+   >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+   >>> tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
+
+We can also create another array of tensors with different value type:
+
+.. code-block:: python
+
+   >>> tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2, 2))
+   >>> storage_2 = pa.array(arr, pa.list_(pa.float32(), 4))
+   >>> tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2, 
storage_2)
+
+Extension arrays can be used as columns in  ``pyarrow.Table`` or
+``pyarrow.RecordBatch``:
+
+.. code-block:: python
+
+   >>> data = [
+   ...     pa.array([1, 2, 3]),
+   ...     pa.array(['foo', 'bar', None]),
+   ...     pa.array([True, None, True]),
+   ...     tensor_array,
+   ...     tensor_array_2
+   ... ]
+   >>> my_schema = pa.schema([('f0', pa.int8()),
+   ...                        ('f1', pa.string()),
+   ...                        ('f2', pa.bool_()),
+   ...                        ('tensors_int', tensor_type),
+   ...                        ('tensors_float', tensor_type_2)])
+   >>> table = pa.Table.from_arrays(data, schema=my_schema)
+   >>> table
+   pyarrow.Table
+   f0: int8
+   f1: string
+   f2: bool
+   tensors_int: extension<arrow.fixed_size_tensor>
+   tensors_float: extension<arrow.fixed_size_tensor>
+   ----
+   f0: [[1,2,3]]
+   f1: [["foo","bar",null]]
+   f2: [[true,null,true]]
+   tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+   tensors_float: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+We can also convert a tensor array to a numpy ndarray:

Review Comment:
   ```suggestion
   We can also convert a tensor array to a single multi-dimensional numpy 
ndarray:
   ```
   
   (to contrast it with the 1D result of `to_numpy()`)



##########
docs/source/python/extending_types.rst:
##########
@@ -357,3 +357,143 @@ pandas ``ExtensionArray``. This method should have the 
following signature::
 
 This way, you can control the conversion of a pyarrow ``Array`` of your pyarrow
 extension type to a pandas ``ExtensionArray`` that can be stored in a 
DataFrame.
+
+
+Canonical extension types
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can find the official list of canonical extension types in the
+:ref:`format_canonical_extensions` section. Here we add examples on how to
+use them in pyarrow.
+
+Fixed size tensor
+"""""""""""""""""
+
+To create an array of tensors with equal shape (fixed shape tensor array) we
+first need to define a fixed shape tensor extension type with value type
+and shape:
+
+.. code-block:: python
+
+   >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
+
+Then we need the storage array with :func:`pyarrow.list_` type where 
``value_type```
+is the fixed shape tensor value type and list size is a product of 
``tensor_type``
+shape elements. Then we can create an array of tensors with
+``pa.ExtensionArray.from_storage()`` method:
+
+.. code-block:: python
+
+   >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+   >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+   >>> tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
+
+We can also create another array of tensors with different value type:
+
+.. code-block:: python
+
+   >>> tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2, 2))
+   >>> storage_2 = pa.array(arr, pa.list_(pa.float32(), 4))
+   >>> tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2, 
storage_2)
+
+Extension arrays can be used as columns in  ``pyarrow.Table`` or
+``pyarrow.RecordBatch``:
+
+.. code-block:: python
+
+   >>> data = [
+   ...     pa.array([1, 2, 3]),
+   ...     pa.array(['foo', 'bar', None]),
+   ...     pa.array([True, None, True]),
+   ...     tensor_array,
+   ...     tensor_array_2
+   ... ]
+   >>> my_schema = pa.schema([('f0', pa.int8()),
+   ...                        ('f1', pa.string()),
+   ...                        ('f2', pa.bool_()),
+   ...                        ('tensors_int', tensor_type),
+   ...                        ('tensors_float', tensor_type_2)])
+   >>> table = pa.Table.from_arrays(data, schema=my_schema)
+   >>> table
+   pyarrow.Table
+   f0: int8
+   f1: string
+   f2: bool
+   tensors_int: extension<arrow.fixed_size_tensor>
+   tensors_float: extension<arrow.fixed_size_tensor>
+   ----
+   f0: [[1,2,3]]
+   f1: [["foo","bar",null]]
+   f2: [[true,null,true]]
+   tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+   tensors_float: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+We can also convert a tensor array to a numpy ndarray:
+
+.. code-block:: python
+
+   >>> numpy_tensor = tensor_array_2.to_numpy_ndarray()
+   >>> numpy_tensor
+   array([[[  1.,   2.],
+         [  3.,   4.]],
+         [[ 10.,  20.],
+         [ 30.,  40.]],
+         [[100., 200.],
+         [300., 400.]]])
+
+.. note::
+
+   Both optional parameters, ``permutation`` and ``dim_names``, are meant to 
provide the user
+   with the information about the logical layout of the data compared to the 
physical layout.
+
+   The conversion to numpy ndarray is only possible for trivial permutations 
(``None`` or
+   ``[0, 1, ... N-1]`` where ``N`` is the number of tensor dimensions).
+
+And also the other way around, we can convert a list of numpy ndarrays to a 
fixed shape tensor

Review Comment:
   And I would maybe say something about the first dimension of the ndarray 
becoming the length of the extension array (and maybe give an example that 
ndarray of shape (3, 2, 2) becomes an arrow array of length 3 with tensor 
elements of shape (2, 2))



##########
docs/source/python/extending_types.rst:
##########
@@ -357,3 +357,143 @@ pandas ``ExtensionArray``. This method should have the 
following signature::
 
 This way, you can control the conversion of a pyarrow ``Array`` of your pyarrow
 extension type to a pandas ``ExtensionArray`` that can be stored in a 
DataFrame.
+
+
+Canonical extension types
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can find the official list of canonical extension types in the
+:ref:`format_canonical_extensions` section. Here we add examples on how to
+use them in pyarrow.
+
+Fixed size tensor
+"""""""""""""""""
+
+To create an array of tensors with equal shape (fixed shape tensor array) we
+first need to define a fixed shape tensor extension type with value type
+and shape:
+
+.. code-block:: python
+
+   >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
+
+Then we need the storage array with :func:`pyarrow.list_` type where 
``value_type```
+is the fixed shape tensor value type and list size is a product of 
``tensor_type``
+shape elements. Then we can create an array of tensors with
+``pa.ExtensionArray.from_storage()`` method:
+
+.. code-block:: python
+
+   >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+   >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+   >>> tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
+
+We can also create another array of tensors with different value type:
+
+.. code-block:: python
+
+   >>> tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2, 2))
+   >>> storage_2 = pa.array(arr, pa.list_(pa.float32(), 4))
+   >>> tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2, 
storage_2)
+
+Extension arrays can be used as columns in  ``pyarrow.Table`` or
+``pyarrow.RecordBatch``:
+
+.. code-block:: python
+
+   >>> data = [
+   ...     pa.array([1, 2, 3]),
+   ...     pa.array(['foo', 'bar', None]),
+   ...     pa.array([True, None, True]),
+   ...     tensor_array,
+   ...     tensor_array_2
+   ... ]
+   >>> my_schema = pa.schema([('f0', pa.int8()),
+   ...                        ('f1', pa.string()),
+   ...                        ('f2', pa.bool_()),
+   ...                        ('tensors_int', tensor_type),
+   ...                        ('tensors_float', tensor_type_2)])
+   >>> table = pa.Table.from_arrays(data, schema=my_schema)
+   >>> table
+   pyarrow.Table
+   f0: int8
+   f1: string
+   f2: bool
+   tensors_int: extension<arrow.fixed_size_tensor>
+   tensors_float: extension<arrow.fixed_size_tensor>
+   ----
+   f0: [[1,2,3]]
+   f1: [["foo","bar",null]]
+   f2: [[true,null,true]]
+   tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+   tensors_float: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+We can also convert a tensor array to a numpy ndarray:
+
+.. code-block:: python
+
+   >>> numpy_tensor = tensor_array_2.to_numpy_ndarray()
+   >>> numpy_tensor
+   array([[[  1.,   2.],
+         [  3.,   4.]],
+         [[ 10.,  20.],
+         [ 30.,  40.]],
+         [[100., 200.],
+         [300., 400.]]])
+
+.. note::
+
+   Both optional parameters, ``permutation`` and ``dim_names``, are meant to 
provide the user
+   with the information about the logical layout of the data compared to the 
physical layout.
+
+   The conversion to numpy ndarray is only possible for trivial permutations 
(``None`` or
+   ``[0, 1, ... N-1]`` where ``N`` is the number of tensor dimensions).
+
+And also the other way around, we can convert a list of numpy ndarrays to a 
fixed shape tensor

Review Comment:
   ```suggestion
   And also the other way around, we can convert a numpy ndarray to a fixed 
shape tensor
   ```



##########
docs/source/python/extending_types.rst:
##########
@@ -357,3 +357,143 @@ pandas ``ExtensionArray``. This method should have the 
following signature::
 
 This way, you can control the conversion of a pyarrow ``Array`` of your pyarrow
 extension type to a pandas ``ExtensionArray`` that can be stored in a 
DataFrame.
+
+
+Canonical extension types
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can find the official list of canonical extension types in the
+:ref:`format_canonical_extensions` section. Here we add examples on how to
+use them in pyarrow.
+
+Fixed size tensor
+"""""""""""""""""
+
+To create an array of tensors with equal shape (fixed shape tensor array) we
+first need to define a fixed shape tensor extension type with value type
+and shape:
+
+.. code-block:: python
+
+   >>> tensor_type = pa.fixed_shape_tensor(pa.int32(), (2, 2))
+
+Then we need the storage array with :func:`pyarrow.list_` type where 
``value_type```
+is the fixed shape tensor value type and list size is a product of 
``tensor_type``
+shape elements. Then we can create an array of tensors with
+``pa.ExtensionArray.from_storage()`` method:
+
+.. code-block:: python
+
+   >>> arr = [[1, 2, 3, 4], [10, 20, 30, 40], [100, 200, 300, 400]]
+   >>> storage = pa.array(arr, pa.list_(pa.int32(), 4))
+   >>> tensor_array = pa.ExtensionArray.from_storage(tensor_type, storage)
+
+We can also create another array of tensors with different value type:
+
+.. code-block:: python
+
+   >>> tensor_type_2 = pa.fixed_shape_tensor(pa.float32(), (2, 2))
+   >>> storage_2 = pa.array(arr, pa.list_(pa.float32(), 4))
+   >>> tensor_array_2 = pa.ExtensionArray.from_storage(tensor_type_2, 
storage_2)
+
+Extension arrays can be used as columns in  ``pyarrow.Table`` or
+``pyarrow.RecordBatch``:
+
+.. code-block:: python
+
+   >>> data = [
+   ...     pa.array([1, 2, 3]),
+   ...     pa.array(['foo', 'bar', None]),
+   ...     pa.array([True, None, True]),
+   ...     tensor_array,
+   ...     tensor_array_2
+   ... ]
+   >>> my_schema = pa.schema([('f0', pa.int8()),
+   ...                        ('f1', pa.string()),
+   ...                        ('f2', pa.bool_()),
+   ...                        ('tensors_int', tensor_type),
+   ...                        ('tensors_float', tensor_type_2)])
+   >>> table = pa.Table.from_arrays(data, schema=my_schema)
+   >>> table
+   pyarrow.Table
+   f0: int8
+   f1: string
+   f2: bool
+   tensors_int: extension<arrow.fixed_size_tensor>
+   tensors_float: extension<arrow.fixed_size_tensor>
+   ----
+   f0: [[1,2,3]]
+   f1: [["foo","bar",null]]
+   f2: [[true,null,true]]
+   tensors_int: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+   tensors_float: [[[1,2,3,4],[10,20,30,40],[100,200,300,400]]]
+
+We can also convert a tensor array to a numpy ndarray:
+
+.. code-block:: python
+
+   >>> numpy_tensor = tensor_array_2.to_numpy_ndarray()
+   >>> numpy_tensor
+   array([[[  1.,   2.],
+         [  3.,   4.]],
+         [[ 10.,  20.],

Review Comment:
   I think the alignment is a bit off here. If I copy past this from a console, 
it looks like:
   
   ```
   In [27]: tensor_array_2.to_numpy_ndarray()
   Out[27]: 
   array([[[  1.,   2.],
           [  3.,   4.]],
   
          [[ 10.,  20.],
           [ 30.,  40.]],
   
          [[100., 200.],
           [300., 400.]]], dtype=float32)
   ```
   
   So the square brackets are vertically aligned to better show the multiple 
dimensions. So would try to do that exactly the same here as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to