The intention is that each individual record could have a different size.
This could be consistent within a given batch, but wouldn't need to be.
For example, if I wanted to send a 3-channel image, but the image size may
vary for each record, then I could use
FixedSizeList<FixedSizeList<FixedSizeList<Int8>[3]>[-1]>[-1].

On Mon, Jul 29, 2019 at 1:18 PM Brian Hulette <bhule...@apache.org> wrote:

> This isn't really relevant but I feel compelled to point it out - the
> FixedSizeList type has actually been in the Arrow spec for a while, but it
> was only implemented in JS and Java initially. It was implemented in C++
> just a few months ago.
>

Thanks for the clarification -- I was going based on the blame history for
Layout.rst, but I guess it just didn't get officially documented there
until the c++ implementation was added.

-Edward


> On Mon, Jul 29, 2019 at 7:01 AM Edward Loper <edlo...@google.com.invalid>
> wrote:
>
> > The FixedSizeList type, which was added to Arrow a few months ago, is an
> > array where each slot contains a fixed-size sequence of values.  It is
> > specified as FixedSizeList<T>[N], where T is a child type and N is a
> signed
> > int32 that specifies the length of each list.
> >
> > This is useful for encoding fixed-size tensors.  E.g., if I have a
> 100x8x10
> > tensor, then I can encode it as
> > FixedSizeList<FixedSizeList<FixedSizeList<byte>[10]>[8]>[100].
> >
> > But I'm also interested in encoding tensors where some dimension sizes
> are
> > not known in advance.  It seems to me that FixedSizeList could be
> extended
> > to support this fairly easily, by simply defining that N=-1 means "each
> > array slot has the same length, but that length is not known in advance."
> >  So e.g. we could encode a 100x?x10 tensor as
> > FixedSizeList<FixedSizeList<FixedSizeList<byte>[10]>[-1]>[100].
> >
> > Since these N=-1 row-lengths are not encoded in the type, we need some
> way
> > to determine what they are.  Luckily, every Field in the schema has a
> > corresponding FieldNode in the message; and those FieldNodes can be used
> to
> > deduce the row lengths.  In particular, the row length must be equal to
> the
> > length of the child node divided by the length of the FixedSizeList.
> E.g.,
> > if we have a FixedSizeList<byte>[-1] array with the values [[1, 2], [3,
> 4],
> > [5, 6]] then the message representation is:
> >
> > * Length: 3, Null count: 0
> > * Null bitmap buffer: Not required
> > * Values array (byte array):
> >     * Length: 6,  Null count: 0
> >     * Null bitmap buffer: Not required
> >     * Value buffer: [1, 2, 3, 4, 5, 6, <unspecified padding bytes>]
> >
> > So we can deduce that the row length is 6/3=2.
> >
> > It looks to me like it would be fairly easy to add support for this.
> E.g.,
> > in the FixedSizeListArray constructor in c++, if list_type()->list_size()
> > is -1, then set list_size_ to values.length()/length.  There would be no
> > changes to the schema.fbs/message.fbs files -- we would just be
> assigning a
> > meaning to something that's currently meaningless (having
> > FixedSizeList.listSize=-1).
> >
> > If there's support for adding this to Arrow, then I could put together a
> > PR.
> >
> > Thanks,
> > -Edward
> >
> > P.S. Apologies if this gets posted twice -- I sent it out a couple days
> ago
> > right before subscribing to the mailing list; but I don't see it on the
> > archives, presumably because I wasn't subscribed yet when I sent it out.
> >
>

Reply via email to