Thank you both for your time and for your answers. I've made a mistake; you
are right :-)

Best,
Jorge



On Sun, Feb 21, 2021 at 1:39 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Hi Jorge,
> I'm not sure I understand your example, but I would expect any child array
> of a fixed size list to always have N*size of fixed size list elements.  So
> for:
> [
>   null,
>   [a, bc],
>   [de, feg]
> ]
>
> (i.e. FixedSizeList<Binary>(2) where length 3.  with the first element is
> null)
>
> I would expect the child array to have [0, 0, 0, 1, 3, 5, 8]  as its
> indices (a total logical length=6).
>
> Which I think corresponds to your second representation of the child
> array?  C++ Validates FixedSizeLists in its validate method to meet this
> conditions [1]
>
> We should probably clarify the specification.
>
> -Micah
>
> [1]
>
> https://github.com/apache/arrow/blob/995abdc02fed412bbd947fe41a0765036dbbe820/cpp/src/arrow/array/validate.cc#L103
>
>
>
>
> On Sun, Feb 21, 2021 at 12:38 AM Jorge Cardoso Leitão <
> jorgecarlei...@gmail.com> wrote:
>
> > Hi,
> >
> > We state in the spec that:
> >
> > A fixed size list type is specified like FixedSizeList<T>[N], where T is
> > > any type (*primitive or nested*) and N is a 32-bit signed integer
> > > representing the length of the lists.
> > >
> >
> > (emphasis mine)
> >
> > Now, suppose that we have FixedSizeList<Binary>[2], i.e. a fixed type
> whose
> > inner is a variable sized type, as follows
> >
> > [
> >     Null,
> >     [
> >         [[0], [1, 2]],
> >         [[3, 4], [5]],
> >     ]
> > ]
> >
> > Looking at the offsets of the binary, two options seem possible according
> > to the spec:
> >
> > 1. [0, 1, 3, 5, 6]  (i.e. inner has len = 4)
> > 2. [0, 0, 0, 1, 3, 5, 6]  (i.e. inner has len = 6)
> >
> > The difference in behavior emerges whenever we want to access the values
> of
> > the i'th slot of the fixed list, e.g. [ [[0], [1, 2]], [[3, 4], [5]] ]
> > above.
> >
> > With option 1, we can't slice the inner using `[i * 2, (i + 1) * 2]`:
> for i
> > = 1 this would correspond to the offsets `[3, 5, 6, out of bounds]` (the
> > result would still be wrong if this was in bounds, as it excluded the
> > `[[0], [1, 2]]`). In this case, we need to count the number of nulls,
> > `nulls`, up to `i` and take `[(i - nulls) * 2, (i - nulls + 1) * 2]`.
> >
> > If we use option 2, we can slice the binary directly using `[i * 2, (i +
> 1)
> > * 2]`: for i = 1, this would correspond to the offsets `[0, 1, 3, 5, 6]`,
> > which is correct.
> >
> > The challenge here is that there is no way to tell whether the inner
> array
> > fulfills this "sliceability" constraint or not. I can't find this
> > constraint in the spec. Do we enforce it somewhere? Note that this
> behavior
> > only affects FixedSizeList, but it does affect all variations whose inner
> > has a variable size (List, Binary, Utf8, etc).
> >
> > Any ideas?
> >
> > Best,
> > Jorge
> >
>

Reply via email to