rok commented on PR #37166: URL: https://github.com/apache/arrow/pull/37166#issuecomment-1708235013
We could split this into two types: `VariableShapeTensor` that has `ndim=const`, stores permutations and raged dimension list per array. This is less general but uses `fixed_size_list` for shape and saves storage cost on not storing strides. It does incur computation cost when strides need to be calculated. This layout would map to [tf.RaggedTenor](https://www.tensorflow.org/api_docs/python/tf/RaggedTensor). Storage would be: ```cpp struct_({field("shape", fixed_size_list(uint32(), ndim)), field("data", list(value_type))}) ``` `VariableDimensionTensor` that has `ndim!=const` and stores strides (or permutations) per row. This is more general, but would add extra storage cost by storing strides (and keeping shapes in `list`). It would require less computation as strides would not have to be calculated at use time. This layout would map well to [torch.Nested](https://pytorch.org/docs/stable/nested.html), but would be more general as it would allow arbitrary ndim per row. ```cpp struct_({field("shape", list(uint32()), field("strides", list(int64()), field("data", list(value_type))}) ``` cc @pitrou -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
