[
https://issues.apache.org/jira/browse/ARROW-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17100249#comment-17100249
]
Christian Hudon commented on ARROW-1614:
----------------------------------------
This is a blocker for some Arrow use cases for us, so I'd be willing to work on
this, with a bit of guidance. The first step would be to agree on the approach
to take.
For me, there are two cases I'd need Arrow to support:
# Each row is a tensor of a different shape (e.g images of different sizes),
but of the same underlying type (e.g. int32). I don't see needing each row
being a tensor with a different number of dimensions, so that could be out of
scope if desired.
# All rows have the same shape (so the whole column could potentially be
handed off to e.g. scikit-image, as an array of n images of the same size).
>From what I understand of Arrow, here's how I would implement this:
# A first column containing the elements from all the tensors (in row-major
order), and a second containing a tuple with that tensor's shape. The start
offset of the data for the next tensor can be computed from the shape of the
previous one. (Would also need a separate column containing the pre-computed
start index of for each tensor?)
# Similarly, the data from the tensors would be stored all together in
row-major order. The shape (without the first dimension) would be store in the
metadata for that column.
Thoughts?
> [C++] Add a Tensor logical value type implemented using ExtensionType
> ---------------------------------------------------------------------
>
> Key: ARROW-1614
> URL: https://issues.apache.org/jira/browse/ARROW-1614
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Format
> Reporter: Wes McKinney
> Priority: Major
>
> In an Arrow table, conceivably a column could have values cells each
> containing a tensor value of some size (a binary value plus some metadata to
> store type and shape/strides)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)