westonpace commented on code in PR #37166:
URL: https://github.com/apache/arrow/pull/37166#discussion_r1303654040


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -148,6 +148,60 @@ Fixed shape tensor
   by this specification. Instead, this extension type lets one use fixed shape 
tensors
   as elements in a field of a RecordBatch or a Table.
 
+.. _variable_shape_tensor_extension:
+
+Variable shape tensor
+=====================
+
+* Extension name: `arrow.variable_shape_tensor`.
+
+* The storage type of the extension is: ``StructArray`` where struct
+  is composed of **data** and **shape** fields describing a single
+  tensor per row:
+
+  * **data** is a ``List`` holding tensor elements of a single tensor.
+    Data type of the list elements is uniform across the entire column
+    and also provided in metadata.
+  * **shape** is a ``FixedSizeList`` of the tensor shape where
+    the size of the list is equal to the number of dimensions of the
+    tensor.

Review Comment:
   FixedSizeList of what?  Int32?  Int16?  Any integer type?



##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -148,6 +148,60 @@ Fixed shape tensor
   by this specification. Instead, this extension type lets one use fixed shape 
tensors
   as elements in a field of a RecordBatch or a Table.
 
+.. _variable_shape_tensor_extension:
+
+Variable shape tensor
+=====================
+
+* Extension name: `arrow.variable_shape_tensor`.
+
+* The storage type of the extension is: ``StructArray`` where struct
+  is composed of **data** and **shape** fields describing a single
+  tensor per row:
+
+  * **data** is a ``List`` holding tensor elements of a single tensor.
+    Data type of the list elements is uniform across the entire column
+    and also provided in metadata.
+  * **shape** is a ``FixedSizeList`` of the tensor shape where
+    the size of the list is equal to the number of dimensions of the
+    tensor.
+
+* Extension type parameters:
+
+  * **value_type** = the Arrow data type of individual tensor elements.
+  * **ndim** = the number of dimensions of the tensor.
+
+  Optional parameters describing the logical layout:
+
+  * **dim_names** = same as `arrow.fixed_shape_tensor`
+
+  * **permutation** = same as in `arrow.fixed_shape_tensor`
+
+* Description of the serialization:
+
+  The metadata must be a valid JSON object including number of
+  dimensions of the contained tensors as an integer with key **"ndim"**
+  plus optional dimension names with keys **"dim_names"** and ordering of
+  the dimensions with key **"permutation"**.
+
+  - Example: ``{ "ndim": 2}``
+  - Example with ``dim_names`` metadata for NCHW ordered data:
+
+    ``{ "ndim": 3, "dim_names": ["C", "H", "W"]}``
+
+  - Example of permuted 3-dimensional tensor:
+
+    ``{ "ndim": 3, "permutation": [2, 0, 1]}``
+
+    This is the physical layout shape and the shape of the logical
+    layout would given an individual tensor of shape [100, 200, 500]
+    be ``[500, 100, 200]``.

Review Comment:
   The way the description for this type (and for the fixed size tensor type 
actually) reads it makes it seem like there should be a `value_type` parameter 
in the JSON.  However, I think this parameter is actually expressed by the 
choice of storage type right?  E.g. if the storage type is `struct<data: 
List<Float64>, shape: FixedSizeList<3>>` then one can infer that the 
`value_type` is `Float64`.
   
   However, couldn't you also infer that `ndim` is 3?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to