paleolimbot commented on code in PR #37166:
URL: https://github.com/apache/arrow/pull/37166#discussion_r1317269300


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -148,6 +148,76 @@ Fixed shape tensor
   by this specification. Instead, this extension type lets one use fixed shape 
tensors
   as elements in a field of a RecordBatch or a Table.
 
+.. _variable_shape_tensor_extension:
+
+Variable shape tensor
+=====================
+
+* Extension name: `arrow.variable_shape_tensor`.
+
+* The storage type of the extension is: ``StructArray`` where struct
+  is composed of **data** and **shape** fields describing a single
+  tensor per row:
+
+  * **data** is a ``List`` holding tensor elements of a single tensor.
+    Data type of the list elements is uniform across the entire column
+    and also provided in metadata.
+  * **shape** is a ``FixedSizeList<uint32>[ndim]`` of the tensor shape where
+    the size of the list ``ndim`` is equal to the number of dimensions of the
+    tensor.

Review Comment:
   Yes, there is potentially a different `ndim` for each item in the array. I 
imagine that in practice this does not frequently occur but at the time we 
resolve the Arrow output type we don't have any actual data to inspect to 
guess. Opening that can of worms would be hard but we might have to do it for 
other reasons, too (e.g., guessing decimal output precision/bitwidth since that 
can very by row as well in Postgres).
   
   I don't want the discussion to get *too* hung up on this point if it makes 
life more difficult. If I had to choose between allowing `ndim` to vary among 
items in the array and consistency with the `fixed_shape_tensor()`, I would 
pick consistency with the fixed shape tensor! There are other considerations 
for returning arrays from Postgres to Arrow (for example, if `ndim` is 1, a 
more intuitive output type would be a plain `List`); my initial comment of "we 
won't be able to use this" is probably not true.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to