[ 
https://issues.apache.org/jira/browse/ARROW-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863035#comment-15863035
 ] 

Wes McKinney commented on ARROW-550:
------------------------------------

I will look into the links you provided, thank you. Couple immediate follow up 
questions:

* Do you have any need of nested types with tensors? 
* When you say "all NumPy types", do you mean all fixed-size data types 
excluding {{dtype=object}} arrays? 

Having tensor metadata that supports more than int32_t size elements is not a 
problem -- expanding normal Arrow arrays to exceed 2G would be a challenging 
proposition because Java does not support large buffers very well. I suspect 
that the tensor functionality would be limited to native code users.

As a first draft implementation in the Arrow C++ libraries I would add 
additional metadata for tensors and define tensor types that use the common 
metadata types defined in {{arrow::DataType}}. 

> [Format] Add a TensorMessage type
> ---------------------------------
>
>                 Key: ARROW-550
>                 URL: https://issues.apache.org/jira/browse/ARROW-550
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Format
>            Reporter: Wes McKinney
>
> Since all data message types at the moment are 1-dimensional, a "tensor" 
> message will contain an array of dimensions and an order flag (C order vs. 
> Fortran order) to enable data to be interpreted as multiple dimensions. This 
> is similar to multidimensional arrays in APL or Fortran or MATLAB, ndarrays 
> in NumPy, etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to