rok commented on code in PR #37166:
URL: https://github.com/apache/arrow/pull/37166#discussion_r1347320516
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -148,6 +148,112 @@ Fixed shape tensor
by this specification. Instead, this extension type lets one use fixed shape
tensors
as elements in a field of a RecordBatch or a Table.
+.. _variable_shape_tensor_extension:
+
+Variable shape tensor
+=====================
+
+* Extension name: `arrow.variable_shape_tensor`.
+
+* The storage type of the extension is: ``StructArray`` where struct
+ is composed of **data** and **shape** fields describing a single
+ tensor per row:
+
+ * **data** is a ``List`` holding tensor elements of a single tensor.
+ Data type of the list elements is uniform across the entire column.
+ * **shape** is a ``FixedSizeList<uint32>[ndim]`` of the tensor shape where
+ the size of the list ``ndim`` is equal to the number of dimensions of the
+ tensor.
+
+* Extension type parameters:
+
+ * **value_type** = the Arrow data type of individual tensor elements.
+
+ Optional parameters describing the logical layout:
+
+ * **dim_names** = explicit names to tensor dimensions
+ as an array. The length of it should be equal to the shape
+ length and equal to the number of dimensions.
+
+ ``dim_names`` can be used if the dimensions have well-known
+ names and they map to the physical layout (row-major).
+
+ * **permutation** = indices of the desired ordering of the
+ original dimensions, defined as an array.
+
+ The indices contain a permutation of the values [0, 1, .., N-1] where
+ N is the number of dimensions. The permutation indicates which
+ dimension of the logical layout corresponds to which dimension of the
+ physical tensor (the i-th dimension of the logical view corresponds
+ to the dimension with number ``permutations[i]`` of the physical tensor).
+
+ Permutation can be useful in case the logical order of
+ the tensor is a permutation of the physical order (row-major).
+
+ When logical and physical layout are equal, the permutation will always
+ be ([0, 1, .., N-1]) and can therefore be left out.
+
+ * **uniform_dimensions** = indices of dimensions whose sizes are
+ guaranteed to remain constant. Indices are a subset of all possible
+ dimension indices ([0, 1, .., N-1]).
+ The uniform dimensions must still be represented in the `shape` field,
+ and must always be the same value for all tensors in the array -- this
+ allows code to interpret the tensor correctly without accounting for
+ uniform dimensions while still permitting optional optimizations that
+ take advantage of the uniformity. uniform_dimensions can be left out,
+ in which case it is assumed that all dimensions might be variable.
+
+ * **uniform_shape** = shape of the dimensions that are guaranteed to stay
+ constant over all tensors in the array, with the shape of the ragged
dimensions
+ set to 0.
+ An array containing tensor with shape (2, 3, 4) and uniform dimensions
+ (0, 2) would have uniform shape (2, 0, 4).
+
+* Description of the serialization:
+
+ The metadata must be a valid JSON object, that optionally includes
+ dimension names with keys **"dim_names"**, ordering of
+ dimensions with key **"permutation"**, indices of dimensions whose sizes
+ are guaranteed to remain constant with key **"uniform_dimensions"** and
+ shape of those dimensions with key **"uniform_shape"**.
+ Minimal metadata is an empty JSON object.
+
+ - Example of minimal metadata is:
+
+ ``{}``
+
+ - Example with ``dim_names`` metadata for NCHW ordered data:
+
+ ``{ "dim_names": ["C", "H", "W"] }``
+
+ - Example with ``uniform_dimensions`` metadata for a set of color images
+ with variable width:
Review Comment:
Current language was demanding both be given at the same time. So that's imo
equivalent to just having `uniform_shape`, so let's simplify to `uniform_shape`
only.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]