lhoestq commented on code in PR #33925:
URL: https://github.com/apache/arrow/pull/33925#discussion_r1094440818


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -72,4 +72,30 @@ same rules as laid out above, and provide backwards 
compatibility guarantees.
 Official List
 =============
 
-No canonical extension types have been standardized yet.
+Fixed shape tensor
+==================
+
+* Extension name: `arrow.fixed_shape_tensor`.
+
+* The storage type of the extension: ``FixedSizeList`` where:

Review Comment:
   Just throwing ideas here - please ignore if it doesn't make sense. I'm not 
familiar enough with the constraints that you have for canonical extension 
types.
   
   What if in addition to the extension type with `shape` and `value_type`, 
there is an extension array which stores the storage and `strides` (the more 
general way to interpret a numpy array storage if I understand correctly) - as 
well as an offset/length in case the array is sliced (because it may not be 
trivial to slice the storage). Dimension names could be optional in the 
extension type for computer vision folks.
   
   That means that we could zero-copy read the arrow array as a numpy array 
(and I guess into pytorch as well). Making an arrow array from a tensor with a 
storage that doesn't fit a fixedsizelist would require rewriting the storage 
though.
   
   This should also allow to concatenate tensors with same shape but different 
memory formats. But on the other hand I'm not sure if it's possible to get a 
numpy array from tensors with mixed memory formats with zero copy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to