rok commented on code in PR #37166:
URL: https://github.com/apache/arrow/pull/37166#discussion_r1317182293
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -148,6 +148,76 @@ Fixed shape tensor
by this specification. Instead, this extension type lets one use fixed shape
tensors
as elements in a field of a RecordBatch or a Table.
+.. _variable_shape_tensor_extension:
+
+Variable shape tensor
+=====================
+
+* Extension name: `arrow.variable_shape_tensor`.
+
+* The storage type of the extension is: ``StructArray`` where struct
+ is composed of **data** and **shape** fields describing a single
+ tensor per row:
+
+ * **data** is a ``List`` holding tensor elements of a single tensor.
+ Data type of the list elements is uniform across the entire column
+ and also provided in metadata.
+ * **shape** is a ``FixedSizeList<uint32>[ndim]`` of the tensor shape where
+ the size of the list ``ndim`` is equal to the number of dimensions of the
+ tensor.
Review Comment:
Are you saying you get `ndim` at query time and expect `ndim=const` over the
entire array or are you saying that every row will potentially have different
`ndim`? If latter is the case we could use storage:
```cpp
struct_({field("shape", list(uint32())), field("data", list(value_type))})
```
That would work if we can assume consistent memory layout (e.g. always row
major). However if we want to allow for other memory layouts then we also need
to store per row permutation or strides.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]