[GitHub] [arrow] thomasw21 commented on a diff in pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

via GitHub Thu, 02 Feb 2023 02:44:07 -0800


thomasw21 commented on code in PR #33925:
URL: https://github.com/apache/arrow/pull/33925#discussion_r1094350467



##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -72,4 +72,30 @@ same rules as laid out above, and provide backwards 
compatibility guarantees.
 Official List
 =============
 
-No canonical extension types have been standardized yet.
+Fixed shape tensor
+==================
+
+* Extension name: `arrow.fixed_shape_tensor`.
+
+* The storage type of the extension: ``FixedSizeList`` where:
+
+  * **value_type** is the data type of individual tensors and
+    is an instance of ``pyarrow.DataType`` or ``pyarrow.Field``.
+  * **list_size** is the product of all the elements in tensor shape.
+
+* Extension type parameters:
+
+  * **value_type** = Arrow DataType of the tensor elements
+  * **shape** = shape of the contained tensors as a tuple
+  * **is_row_major** = boolean indicating the order of elements

Review Comment:
   Sometimes users will manipulate non_contiguous formats without realising. 
Typically `torch` uses a `to(memory_format=torch.channels_last)` which 
essentially is a equivalent to `.permute(0,2,3,1).contiguous()`. This API is 
often used in computer vision as some operation run faster when the tensors are 
not actually stored in "row major" fashion. So loading images in that specific 
format directly would help, which means storing non row major tensors.
   
   ```python
   import torch
   
   device = torch.device("cpu")
   x = torch.randn(2,3,5,7, device=device)
   y = x.to(memory_format=torch.channels_last)
   
   # Assert that they don't share memory
   do_share_memory = x.data_ptr() == y.data_ptr()
   print(f"Channels Last shares memory: {do_share_memory}")
   
   # Assert that channels last are not contiguous
   print(f"Channels Last is contiguous: {y.is_contiguous()}")
   
   # Data storage is equivalent to contiguous NHWC
   z = x.permute(0,2,3,1).contiguous()
   is_storage_equal = all(z_data == y_data for z_data, y_data in 
zip(z.storage(), y.storage()))
   print(f"Channels Last storage is the same as the NHWC format: 
{is_storage_equal}")
   ```
   
   More on this: 
https://pytorch.org/blog/accelerating-pytorch-vision-models-with-channels-last-on-cpu/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] thomasw21 commented on a diff in pull request #33925: GH-33923: [Docs] Tensor canonical extension type specification

Reply via email to