[GitHub] [arrow] thomasw21 commented on a diff in pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

via GitHub Wed, 01 Feb 2023 09:01:40 -0800


thomasw21 commented on code in PR #33925:
URL: https://github.com/apache/arrow/pull/33925#discussion_r1093500508



##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -72,4 +72,30 @@ same rules as laid out above, and provide backwards 
compatibility guarantees.
 Official List
 =============
 
-No canonical extension types have been standardized yet.
+Fixed shape tensor
+==================
+
+* Extension name: `arrow.fixed_shape_tensor`.
+
+* The storage type of the extension: ``FixedSizeList`` where:
+
+  * **value_type** is the data type of individual tensors and
+    is an instance of ``pyarrow.DataType`` or ``pyarrow.Field``.
+  * **list_size** is the product of all the elements in tensor shape.
+
+* Extension type parameters:
+
+  * **value_type** = Arrow DataType of the tensor elements
+  * **shape** = shape of the contained tensors as a tuple
+  * **is_row_major** = boolean indicating the order of elements

Review Comment:
   > Yes. We would just use the physical layout of the source and not change 
memory layout when going in and out of the extension. We would provide an 
option to store the layout as metadata.
   
   This makes sense. I do think that in terms of UX it helps to be agnostic of 
those kind of consideration (and so having the format handle this in an 
underlying fashion it helps). But it makes sense to have a simple design.
   
   > My understanding is that contiguous tensors in torch are indeed always 
row-major, but so that also means that if you have such a contiguous tensor, 
you don't need any copy to put this in the proposed extension TensorArray (or 
you can get it out without a copy).
   
   Yes for contiguous it's perfect. The proposition of having `stride` as well, 
would make a zero copy system as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] thomasw21 commented on a diff in pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

Reply via email to