Yeah, I didn't quite follow the example either; it seems like your example
actually corresponds to a FixedSizeList<FixedSizeList<Binary>[2]>[2]? Or
perhaps FixedSizeList<List<Binary>>[2]? Assuming the former, it seems you'd
need additional fixed size slots to account for the Null element. In Julia,
you can inspect the internal structure of this like:

julia> c = [missing, ( ([0x00], [0x01, 0x02]), ([0x03, 0x04], [0x05]))]
2-element Vector{Union{Missing, Tuple{Tuple{Vector{UInt8}, Vector{UInt8}},
Tuple{Vector{UInt8}, Vector{UInt8}}}}}:
 missing
 ((UInt8[0x00], UInt8[0x01, 0x02]), (UInt8[0x03, 0x04], UInt8[0x05]))

julia> ac = Arrow.toarrowvector(c)
2-element Arrow.FixedSizeList{Union{Missing, Tuple{Tuple{Vector{UInt8},
Vector{UInt8}}, Tuple{Vector{UInt8}, Vector{UInt8}}}},
Arrow.FixedSizeList{Tuple{Vector{UInt8}, Vector{UInt8}},
Arrow.List{Vector{UInt8}, Int32, Arrow.ToList{UInt8, false, Vector{UInt8},
Int32}}}}:
 missing
 ((UInt8[0x00], UInt8[0x01, 0x02]), (UInt8[0x03, 0x04], UInt8[0x05]))

# binary list data
julia> ac.data.data.data
10-element Arrow.ToList{UInt8, false, Vector{UInt8}, Int32}:
 0x00
 0x00
 0x00
 0x00
 0x00
 0x01
 0x02
 0x03
 0x04
 0x05

# binary list offsets
julia> ac.data.data.offsets
8-element Arrow.Offsets{Int32}:
 (1, 1)
 (2, 2)
 (3, 3)
 (4, 4)
 (5, 5)
 (6, 7)
 (8, 9)
 (10, 10)

On Sun, Feb 21, 2021 at 1:38 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Hi,
>
> We state in the spec that:
>
> A fixed size list type is specified like FixedSizeList<T>[N], where T is
> > any type (*primitive or nested*) and N is a 32-bit signed integer
> > representing the length of the lists.
> >
>
> (emphasis mine)
>
> Now, suppose that we have FixedSizeList<Binary>[2], i.e. a fixed type whose
> inner is a variable sized type, as follows
>
> [
>     Null,
>     [
>         [[0], [1, 2]],
>         [[3, 4], [5]],
>     ]
> ]
>
> Looking at the offsets of the binary, two options seem possible according
> to the spec:
>
> 1. [0, 1, 3, 5, 6]  (i.e. inner has len = 4)
> 2. [0, 0, 0, 1, 3, 5, 6]  (i.e. inner has len = 6)
>
> The difference in behavior emerges whenever we want to access the values of
> the i'th slot of the fixed list, e.g. [ [[0], [1, 2]], [[3, 4], [5]] ]
> above.
>
> With option 1, we can't slice the inner using `[i * 2, (i + 1) * 2]`: for i
> = 1 this would correspond to the offsets `[3, 5, 6, out of bounds]` (the
> result would still be wrong if this was in bounds, as it excluded the
> `[[0], [1, 2]]`). In this case, we need to count the number of nulls,
> `nulls`, up to `i` and take `[(i - nulls) * 2, (i - nulls + 1) * 2]`.
>
> If we use option 2, we can slice the binary directly using `[i * 2, (i + 1)
> * 2]`: for i = 1, this would correspond to the offsets `[0, 1, 3, 5, 6]`,
> which is correct.
>
> The challenge here is that there is no way to tell whether the inner array
> fulfills this "sliceability" constraint or not. I can't find this
> constraint in the spec. Do we enforce it somewhere? Note that this behavior
> only affects FixedSizeList, but it does affect all variations whose inner
> has a variable size (List, Binary, Utf8, etc).
>
> Any ideas?
>
> Best,
> Jorge
>

Reply via email to