hi Bryan,

I agree this would be useful to work out.

There's a few options:

* Sending multiple tensors as a sequence of encapsulated IPC messages
(as described in
https://github.com/apache/arrow/blob/master/docs/source/format/IPC.rst).
There is no conflict with the columnar streaming protocol that
prevents this
* Embedding tensors in BinaryArray columns in some way (e.g. as an
ExtensionType, which we have now in C++)
* Adding Tensor as a logical type (this is essentially ARROW-1614)

I would like to understand the use cases more precisely. Perhaps you
can write a design document that describes the use cases in detail and
proposed solution? This doesn't fall anywhere on my list of 2019
priorities but I'm happy to give feedback on discussions and review
PRs where relevant.

In conjunction with embedding sequences of tensors in a BinaryArray,
we would probably need to first develop a LargeBinaryArray with 64-bit
offsets, so that buffers can be arbitrarily large (well, within 64-bit
address space at least)

- Wes

On Fri, Mar 22, 2019 at 1:24 PM Bryan Cutler <[email protected]> wrote:
>
> Hi All,
>
> Recently I have been working with the TensorFlow SIG-IO community to 
> introduce Apache Arrow based Datasets for bringing Arrow data into 
> TensorFlow. SIG-IO is a community maintained repository focused on 
> input/output support for TF, see https://github.com/tensorflow/io (a lot of 
> formats from contrib/ ended up here).  Since it is community driven, if 
> anyone is interested, participation is highly encouraged!
>
> I'm bringing this up for a couple reasons. First, I want to make sure that 
> this stays in-line with any related efforts within the Arrow project and 
> welcome any feedback. Secondly, the initial response has been great and 
> people are excited about using Arrow and looking to use it in other areas of 
> TF, but I've noticed there has been some confusion about how Arrow handles 
> tensor data. Specifically, it gets assumed that tensors could be part of a 
> RecordBatch and could be readily used in an Arrow stream.
>
> I know we have talked about making tensors a logical type for columnar data 
> before in 
> https://lists.apache.org/thread.html/6cc86d50d92dbd21d6fc34e34485afb3cab4956fbc0d61ff9b99ea27@%3Cdev.arrow.apache.org%3E
>  and there is a JIRA ARROW-1614, but since there is work needed to fully 
> support the current spec for 1.0, I don't think it has moved forward much. 
> I'm wondering if maybe now is a better time to start working on this?  I 
> think having built-in support for tensor columns would really help to 
> increase adoption of Arrow in frameworks that use tensor data. What are other 
> people's thoughts?
>
> Best Regards,
> Bryan
>

Reply via email to