[GitHub] [arrow] rok commented on pull request #37166: GH-24868: [C++] Add a Tensor logical value type with varying dimensions, implemented using ExtensionType

via GitHub Wed, 06 Sep 2023 05:16:16 -0700


rok commented on PR #37166:
URL: https://github.com/apache/arrow/pull/37166#issuecomment-1708235013


   We could split this into two types:
   `VariableShapeTensor` that has `ndim=const`, stores permutations and raged 
dimension list per array. This is less general but uses `fixed_size_list` for 
shape and saves storage cost on not storing strides. It does incur computation 
cost when strides need to be calculated. This layout would map to 
[tf.RaggedTenor](https://www.tensorflow.org/api_docs/python/tf/RaggedTensor). 
Storage would be:
    ```cpp
    struct_({field("shape", fixed_size_list(uint32(), ndim)), field("data", 
list(value_type))})
    ```
   
   `VariableDimensionTensor` that has `ndim!=const` and stores strides (or 
permutations) per row. This is more general, but would add extra storage cost 
by storing strides (and keeping shapes in `list`). It would require less 
computation as strides would not have to be calculated at use time. This layout 
would map well to [torch.Nested](https://pytorch.org/docs/stable/nested.html), 
but would be more general as it would allow arbitrary ndim per row.
    ```cpp
    struct_({field("shape", list(uint32()), field("strides", list(int64()), 
field("data", list(value_type))})
   ```
   
   cc @pitrou 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] rok commented on pull request #37166: GH-24868: [C++] Add a Tensor logical value type with varying dimensions, implemented using ExtensionType

Reply via email to