[
https://issues.apache.org/jira/browse/ARROW-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222490#comment-17222490
]
Bryan Cutler commented on ARROW-8714:
-------------------------------------
+1 on the proposal of having a list array for the data (of same type as the
tensor) and second array for the shape. For the shape, a list array of ints
would work but it could also be possible to modify Tensor.fbs slightly to have
a TensorShape message. That might have some benefit to keep the size down for
lots of small tensors, but not sure if it's worth the added complexity.
I also had another thought, if the shape for each tensor added an additional
outer dimension to represent how many records are in each tensor, that would
allow us to use a single tensor extension type for both variable and constant
dimensions. For example, say you have 10 tensors of shape (2, 3) stacked in a
single ndarray of (10, 2, 3), then the shape array would have a single entry
{{[(10, 2, 3)]}}, if you have 10 tensors of varying shapes, then each one will
have a 1 added to the outer dimension, so 10 entries with {{[(1, 2, 3), (1, 5,
3), (1, 4, 3), ...]}}. It would be a little redundant having the 1's in this
case, but this would also allow to combine smaller batches, say 10 tensors
where 5 are same dims would give you {{[(5, 2, 3), (5, 4, 6)]}}. What do you
think of this [~chrish42] and [~jorisvandenbossche] ?
> [C++] Add a Tensor logical value type with varying dimensions, implemented
> using ExtensionType
> ----------------------------------------------------------------------------------------------
>
> Key: ARROW-8714
> URL: https://issues.apache.org/jira/browse/ARROW-8714
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Format
> Reporter: Christian Hudon
> Priority: Major
>
> Support for tensor in Table, RecordBatch, etc. where each row is a tensor of
> a different shape (e.g images of different sizes), but of the same underlying
> type (e.g. int32). Implemented as an ExtensionType, so no need to change the
> format.
> I don't see needing each row being a tensor with a different number of
> dimensions, so if the implementation for that falls out easily of the use
> case with each row in the table having a tensor with the same number of
> dimensions, great. If it adds a lot of complexity, that case would be
> postponed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)