Hi Jorge,
I'm not sure I understand your example, but I would expect any child array
of a fixed size list to always have N*size of fixed size list elements.  So
for:
[
  null,
  [a, bc],
  [de, feg]
]

(i.e. FixedSizeList<Binary>(2) where length 3.  with the first element is
null)

I would expect the child array to have [0, 0, 0, 1, 3, 5, 8]  as its
indices (a total logical length=6).

Which I think corresponds to your second representation of the child
array?  C++ Validates FixedSizeLists in its validate method to meet this
conditions [1]

We should probably clarify the specification.

-Micah

[1]
https://github.com/apache/arrow/blob/995abdc02fed412bbd947fe41a0765036dbbe820/cpp/src/arrow/array/validate.cc#L103




On Sun, Feb 21, 2021 at 12:38 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
> We state in the spec that:
>
> A fixed size list type is specified like FixedSizeList<T>[N], where T is
> > any type (*primitive or nested*) and N is a 32-bit signed integer
> > representing the length of the lists.
> >
>
> (emphasis mine)
>
> Now, suppose that we have FixedSizeList<Binary>[2], i.e. a fixed type whose
> inner is a variable sized type, as follows
>
> [
>     Null,
>     [
>         [[0], [1, 2]],
>         [[3, 4], [5]],
>     ]
> ]
>
> Looking at the offsets of the binary, two options seem possible according
> to the spec:
>
> 1. [0, 1, 3, 5, 6]  (i.e. inner has len = 4)
> 2. [0, 0, 0, 1, 3, 5, 6]  (i.e. inner has len = 6)
>
> The difference in behavior emerges whenever we want to access the values of
> the i'th slot of the fixed list, e.g. [ [[0], [1, 2]], [[3, 4], [5]] ]
> above.
>
> With option 1, we can't slice the inner using `[i * 2, (i + 1) * 2]`: for i
> = 1 this would correspond to the offsets `[3, 5, 6, out of bounds]` (the
> result would still be wrong if this was in bounds, as it excluded the
> `[[0], [1, 2]]`). In this case, we need to count the number of nulls,
> `nulls`, up to `i` and take `[(i - nulls) * 2, (i - nulls + 1) * 2]`.
>
> If we use option 2, we can slice the binary directly using `[i * 2, (i + 1)
> * 2]`: for i = 1, this would correspond to the offsets `[0, 1, 3, 5, 6]`,
> which is correct.
>
> The challenge here is that there is no way to tell whether the inner array
> fulfills this "sliceability" constraint or not. I can't find this
> constraint in the spec. Do we enforce it somewhere? Note that this behavior
> only affects FixedSizeList, but it does affect all variations whose inner
> has a variable size (List, Binary, Utf8, etc).
>
> Any ideas?
>
> Best,
> Jorge
>

Reply via email to