jorisvandenbossche commented on code in PR #33925: URL: https://github.com/apache/arrow/pull/33925#discussion_r1107614102
########## docs/source/format/CanonicalExtensions.rst: ########## @@ -72,4 +72,65 @@ same rules as laid out above, and provide backwards compatibility guarantees. Official List ============= -No canonical extension types have been standardized yet. +Fixed shape tensor +================== + +* Extension name: `arrow.fixed_shape_tensor`. + +* The storage type of the extension: ``FixedSizeList`` where: + + * **value_type** is the data type of individual tensors and + is an instance of ``pyarrow.DataType`` or ``pyarrow.Field``. + * **list_size** is the product of all the elements in tensor shape. + +* Extension type parameters: + + * **value_type** = Arrow DataType of the tensor elements + * **shape** = shape of the contained tensors as an array + + Optional parameters: + + * **dim_names** = explicit names to tensor dimensions + as an array. The length of it should be equal to the shape + length and equal to the number of dimensions. + + ``dim_names`` can be used if the dimensions have well-known + names and they map to the physical layout (row-major). + + * **permutation** = indices of the desired ordering of the + original dimensions, defined as an array. + + The indices contain a permutation of the values [0, 1, .., N-1] where + N is the number of dimensions. The permutation indicates which + dimension of the logical layout corresponds to which dimension of the + physical tensor (the i-th dimension of the logical view corresponds + to the dimension with number ``permutations[i]`` of the physical tensor). + + Permutation can be useful in case the logical order of + the tensor is a permutation of the physical order (row-major). Review Comment: Indeed, we still follow the underlying storage type, so operations like slicing, filtering, take etc all work correctly (because indeed otherwise it could not be done as an extension type, but would have to be a new core type). And that's also the reason that the tensor has to be row-major instead of column-major (because otherwise it would violate the integrity of individual elements of the array). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
