paleolimbot commented on code in PR #37166: URL: https://github.com/apache/arrow/pull/37166#discussion_r1310524404
########## docs/source/format/CanonicalExtensions.rst: ########## @@ -148,6 +148,76 @@ Fixed shape tensor by this specification. Instead, this extension type lets one use fixed shape tensors as elements in a field of a RecordBatch or a Table. +.. _variable_shape_tensor_extension: + +Variable shape tensor +===================== + +* Extension name: `arrow.variable_shape_tensor`. + +* The storage type of the extension is: ``StructArray`` where struct + is composed of **data** and **shape** fields describing a single + tensor per row: + + * **data** is a ``List`` holding tensor elements of a single tensor. + Data type of the list elements is uniform across the entire column + and also provided in metadata. + * **shape** is a ``FixedSizeList<uint32>[ndim]`` of the tensor shape where + the size of the list ``ndim`` is equal to the number of dimensions of the + tensor. Review Comment: Is it necessary for `ndim` to be known in advance? If it is not (i.e., if the storage type is defined instead as a `List<uint32>[ndim]`), we can return this type from the ADBC Postgres driver (here: https://github.com/apache/arrow-adbc/blob/main/c/driver/postgresql/postgres_copy_reader.h#L471-L534 ). If it is essential that the number of dimensions is known at the schema level it is OK too (we just won't be able to actually use it in the postgres driver). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
