Yeah, I didn't quite follow the example either; it seems like your example actually corresponds to a FixedSizeList<FixedSizeList<Binary>[2]>[2]? Or perhaps FixedSizeList<List<Binary>>[2]? Assuming the former, it seems you'd need additional fixed size slots to account for the Null element. In Julia, you can inspect the internal structure of this like:
julia> c = [missing, ( ([0x00], [0x01, 0x02]), ([0x03, 0x04], [0x05]))] 2-element Vector{Union{Missing, Tuple{Tuple{Vector{UInt8}, Vector{UInt8}}, Tuple{Vector{UInt8}, Vector{UInt8}}}}}: missing ((UInt8[0x00], UInt8[0x01, 0x02]), (UInt8[0x03, 0x04], UInt8[0x05])) julia> ac = Arrow.toarrowvector(c) 2-element Arrow.FixedSizeList{Union{Missing, Tuple{Tuple{Vector{UInt8}, Vector{UInt8}}, Tuple{Vector{UInt8}, Vector{UInt8}}}}, Arrow.FixedSizeList{Tuple{Vector{UInt8}, Vector{UInt8}}, Arrow.List{Vector{UInt8}, Int32, Arrow.ToList{UInt8, false, Vector{UInt8}, Int32}}}}: missing ((UInt8[0x00], UInt8[0x01, 0x02]), (UInt8[0x03, 0x04], UInt8[0x05])) # binary list data julia> ac.data.data.data 10-element Arrow.ToList{UInt8, false, Vector{UInt8}, Int32}: 0x00 0x00 0x00 0x00 0x00 0x01 0x02 0x03 0x04 0x05 # binary list offsets julia> ac.data.data.offsets 8-element Arrow.Offsets{Int32}: (1, 1) (2, 2) (3, 3) (4, 4) (5, 5) (6, 7) (8, 9) (10, 10) On Sun, Feb 21, 2021 at 1:38 AM Jorge Cardoso Leitão < jorgecarlei...@gmail.com> wrote: > Hi, > > We state in the spec that: > > A fixed size list type is specified like FixedSizeList<T>[N], where T is > > any type (*primitive or nested*) and N is a 32-bit signed integer > > representing the length of the lists. > > > > (emphasis mine) > > Now, suppose that we have FixedSizeList<Binary>[2], i.e. a fixed type whose > inner is a variable sized type, as follows > > [ > Null, > [ > [[0], [1, 2]], > [[3, 4], [5]], > ] > ] > > Looking at the offsets of the binary, two options seem possible according > to the spec: > > 1. [0, 1, 3, 5, 6] (i.e. inner has len = 4) > 2. [0, 0, 0, 1, 3, 5, 6] (i.e. inner has len = 6) > > The difference in behavior emerges whenever we want to access the values of > the i'th slot of the fixed list, e.g. [ [[0], [1, 2]], [[3, 4], [5]] ] > above. > > With option 1, we can't slice the inner using `[i * 2, (i + 1) * 2]`: for i > = 1 this would correspond to the offsets `[3, 5, 6, out of bounds]` (the > result would still be wrong if this was in bounds, as it excluded the > `[[0], [1, 2]]`). In this case, we need to count the number of nulls, > `nulls`, up to `i` and take `[(i - nulls) * 2, (i - nulls + 1) * 2]`. > > If we use option 2, we can slice the binary directly using `[i * 2, (i + 1) > * 2]`: for i = 1, this would correspond to the offsets `[0, 1, 3, 5, 6]`, > which is correct. > > The challenge here is that there is no way to tell whether the inner array > fulfills this "sliceability" constraint or not. I can't find this > constraint in the spec. Do we enforce it somewhere? Note that this behavior > only affects FixedSizeList, but it does affect all variations whose inner > has a variable size (List, Binary, Utf8, etc). > > Any ideas? > > Best, > Jorge >