My gut feeling is that such a column type should specify both the shape and
primitive type of all values in the column. I can’t think of a common use
case that requires differently shaped tensors in a single column.

Can anyone here come up with such a use case?

If not, I can try to draft a proposal change to the spec that adds these
types. The next question is whether such a change can make it in (with c++
and java implementations) before 1.0.
On Mon, Apr 9, 2018 at 17:36 Wes McKinney <wesmck...@gmail.com> wrote:

> > As far as I know, there is an implementation of tensor type in
> C++/Python already. Should we just finalize the spec and add implementation
> to Java?
>
> There is nothing specified yet as far as a *column* of
> ndarrays/tensors. We defined Tensor metadata for the purposes of
> IPC/serialization but made no effort to incorporate such data into the
> columnar format.
>
> There are likely many ways to implement column whose values are
> ndarrays, each cell with its own shape. Whether we would want to
> permit each cell to have a different ndarray cell type is another
> question (i.e. would we want to constrain every cell in a column to
> contain ndarrays of a particular type, like float64)
>
> So there's a couple of questions
>
> * How to represent the data using the columnar format
> * How to incorporate ndarray metadata into columnar schemas
>
> - Wes
>
> On Mon, Apr 9, 2018 at 5:30 PM, Li Jin <ice.xell...@gmail.com> wrote:
> > As far as I know, there is an implementation of tensor type in C++/Python
> > already. Should we just finalize the spec and add implementation to Java?
> >
> > On the Spark side, it's probably more complicated as Vector and Matrix
> are
> > not "first class" types in Spark SQL. Spark ML implements them as UDT
> > (user-defined types) so it's not clear how to make Spark/Arrow converter
> > work with them.
> >
> > I wonder if Bryan and Holden have some more thoughts on that?
> >
> > Li
> >
> > On Mon, Apr 9, 2018 at 5:22 PM, Leif Walsh <leif.wa...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> I’ve been doing some work lately with Spark’s ML interfaces, which
> include
> >> sparse and dense Vector and Matrix types, backed on the Scala side by
> >> Breeze. Using these interfaces, you can construct DataFrames whose
> column
> >> types are vectors and matrices, and though the API isn’t terribly rich,
> it
> >> is possible to run Python UDFs over such a DataFrame and get numpy
> ndarrays
> >> out of each row. However, if you’re using Spark’s Arrow serialization
> >> between the executor and Python workers, you get this
> >> UnsupportedOperationException:
> >> https://github.com/apache/spark/blob/252468a744b95082400ba9e8b2e3b3
> >> d9d50ab7fa/sql/core/src/main/scala/org/apache/spark/sql/
> >> execution/arrow/ArrowWriter.scala#L71
> >>
> >> I think it would be useful for Arrow to support something like a column
> of
> >> tensors, and I’d like to see if anyone else here is interested in such a
> >> thing.  If so, I’d like to propose adding it to the spec and getting it
> >> implemented in at least Java and C++/Python.
> >>
> >> Some initial mildly-scattered thoughts:
> >>
> >> 1. You can certainly represent these today as List<Double> and
> >> List<List<Double>>, but then need to do some copying to get them back
> into
> >> numpy ndarrays.
> >>
> >> 2. In some cases it might be useful to know that a column contains 3x3x4
> >> tensors, for example, and not just that there are three dimensions as
> you’d
> >> get with List<List<List<Double>>>.  This could constrain what operations
> >> are meaningful (for example, in Spark you could imagine type checking
> that
> >> verifies dimension alignment for matrix multiplication).
> >>
> >> 3. You could approximate that with a FixedSizeList and metadata about
> the
> >> tensor shape.
> >>
> >> 4. But I kind of feel like this is generally useful enough that it’s
> worth
> >> having one implementation of it (well, one for each runtime) in Arrow.
> >>
> >> 5. Or, maybe everyone here thinks Spark should just do this with
> metadata?
> >>
> >> Curious to hear what you all think.
> >>
> >> Thanks,
> >> Leif
> >>
> >> --
> >> --
> >> Cheers,
> >> Leif
> >>
>
-- 
-- 
Cheers,
Leif

Reply via email to