Hi Jorge, I'm not sure I understand your example, but I would expect any child array of a fixed size list to always have N*size of fixed size list elements. So for: [ null, [a, bc], [de, feg] ]
(i.e. FixedSizeList<Binary>(2) where length 3. with the first element is null) I would expect the child array to have [0, 0, 0, 1, 3, 5, 8] as its indices (a total logical length=6). Which I think corresponds to your second representation of the child array? C++ Validates FixedSizeLists in its validate method to meet this conditions [1] We should probably clarify the specification. -Micah [1] https://github.com/apache/arrow/blob/995abdc02fed412bbd947fe41a0765036dbbe820/cpp/src/arrow/array/validate.cc#L103 On Sun, Feb 21, 2021 at 12:38 AM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > We state in the spec that: > > A fixed size list type is specified like FixedSizeList<T>[N], where T is > > any type (*primitive or nested*) and N is a 32-bit signed integer > > representing the length of the lists. > > > > (emphasis mine) > > Now, suppose that we have FixedSizeList<Binary>[2], i.e. a fixed type whose > inner is a variable sized type, as follows > > [ > Null, > [ > [[0], [1, 2]], > [[3, 4], [5]], > ] > ] > > Looking at the offsets of the binary, two options seem possible according > to the spec: > > 1. [0, 1, 3, 5, 6] (i.e. inner has len = 4) > 2. [0, 0, 0, 1, 3, 5, 6] (i.e. inner has len = 6) > > The difference in behavior emerges whenever we want to access the values of > the i'th slot of the fixed list, e.g. [ [[0], [1, 2]], [[3, 4], [5]] ] > above. > > With option 1, we can't slice the inner using `[i * 2, (i + 1) * 2]`: for i > = 1 this would correspond to the offsets `[3, 5, 6, out of bounds]` (the > result would still be wrong if this was in bounds, as it excluded the > `[[0], [1, 2]]`). In this case, we need to count the number of nulls, > `nulls`, up to `i` and take `[(i - nulls) * 2, (i - nulls + 1) * 2]`. > > If we use option 2, we can slice the binary directly using `[i * 2, (i + 1) > * 2]`: for i = 1, this would correspond to the offsets `[0, 1, 3, 5, 6]`, > which is correct. > > The challenge here is that there is no way to tell whether the inner array > fulfills this "sliceability" constraint or not. I can't find this > constraint in the spec. Do we enforce it somewhere? Note that this behavior > only affects FixedSizeList, but it does affect all variations whose inner > has a variable size (List, Binary, Utf8, etc). > > Any ideas? > > Best, > Jorge >