+1
On Mon, 6 Mar 2023 at 12:41, Alenka Frim <[email protected]>
wrote:
> Hi all,
>
> I am starting a new voting thread with this email as the first voting
> thread [1] opened up new
> comments and suggestions and we wanted to take time to see how that
> evolves.
>
> *I would like to propose we vote on adding the fixed shape tensor canonical
> extension type*
> *with the following specification:*
>
> Fixed shape tensor
> ==================
>
> * Extension name: `arrow.fixed_shape_tensor`.
>
> * The storage type of the extension: ``FixedSizeList`` where:
>
> * **value_type** is the data type of individual tensor elements.
> * **list_size** is the product of all the elements in tensor shape.
>
> * Extension type parameters:
>
> * **value_type** = the Arrow data type of individual tensor elements.
> * **shape** = the physical shape of the contained tensors
> as an array.
>
> Optional parameters describing the logical layout:
>
> * **dim_names** = explicit names to tensor dimensions
> as an array. The length of it should be equal to the shape
> length and equal to the number of dimensions.
>
> ``dim_names`` can be used if the dimensions have well-known
> names and they map to the physical layout (row-major).
>
> * **permutation** = indices of the desired ordering of the
> original dimensions, defined as an array.
>
> The indices contain a permutation of the values [0, 1, .., N-1] where
> N is the number of dimensions. The permutation indicates which
> dimension of the logical layout corresponds to which dimension of the
> physical tensor (the i-th dimension of the logical view corresponds
> to the dimension with number ``permutations[i]`` of the physical
> tensor).
>
> Permutation can be useful in case the logical order of
> the tensor is a permutation of the physical order (row-major).
>
> When logical and physical layout are equal, the permutation will always
> be ([0, 1, .., N-1]) and can therefore be left out.
>
> * Description of the serialization:
>
> The metadata must be a valid JSON object including shape of
> the contained tensors as an array with key **"shape"** plus optional
> dimension names with keys **"dim_names"** and ordering of the
> dimensions with key **"permutation"**.
>
> - Example: ``{ "shape": [2, 5]}``
> - Example with ``dim_names`` metadata for NCHW ordered data:
>
> ``{ "shape": [100, 200, 500], "dim_names": ["C", "H", "W"]}``
>
> - Example of permuted 3-dimensional tensor:
>
> ``{ "shape": [100, 200, 500], "permutation": [2, 0, 1]}``
>
> This is the physical layout shape and the the shape of the logical
> layout would in this case be ``[500, 100, 200]``.
>
> .. note::
>
> Elements in a fixed shape tensor extension array are stored
> in row-major/C-contiguous order.
>
> * The specification is submitted as a PR [2] to Canonical Extension Types
> document under the
> format specifications directory [3].
>
> There are also two implementations submitted to Apache Arrow repository:
> * C++ implementation of the proposed specification [4]
> * Python example implementation of the proposed specification and usage
> (only illustrative) [5]
>
>
> The vote will be open for at least 72 hours.
>
> [ ] +1 Accept this proposal
> [ ] +0
> [ ] -1 Do not accept this proposal because...
>
>
> Regards, Alenka
>
> [1]: https://lists.apache.org/thread/3cj0cr44hg3t2rn0kxly8td82yfob1nd
> [2]: https://github.com/apache/arrow/pull/33925/files
> [3]:
>
> https://github.com/apache/arrow/blob/main/docs/source/format/CanonicalExtensions.rst
>
> [4]: https://github.com/apache/arrow/pull/8510/files
> [5]: https://github.com/apache/arrow/pull/33948/files
>