[
https://issues.apache.org/jira/browse/ARROW-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863058#comment-15863058
]
Philipp Moritz commented on ARROW-550:
--------------------------------------
We do need tensors within other types but having only tensors of primitive
types is fine. We have written our own sequence type to support nesting within
lists or dicts, see
https://github.com/ray-project/ray/blob/master/src/numbuf/cpp/src/numbuf/sequence.h.
Something that comes up a lot for example in deep learning is dictionaries of
tensors (these are weight collections for neural networks).
Yes, fixed size types are what we need. To handle dtype=object, we convert the
tensors to lists and then use our sequence type.
Can you clarify what exactly you mean regarding the 2G limitation? If tensors
larger than 2G are not accessible from Java that'd be ok for us for now.
Also using arrow::DataType sounds good!
> [Format] Add a TensorMessage type
> ---------------------------------
>
> Key: ARROW-550
> URL: https://issues.apache.org/jira/browse/ARROW-550
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Format
> Reporter: Wes McKinney
>
> Since all data message types at the moment are 1-dimensional, a "tensor"
> message will contain an array of dimensions and an order flag (C order vs.
> Fortran order) to enable data to be interpreted as multiple dimensions. This
> is similar to multidimensional arrays in APL or Fortran or MATLAB, ndarrays
> in NumPy, etc.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)