Le 11/04/2019 à 10:52, Micah Kornfield a écrit :
> ARROW-4810 [1] and ARROW-750 [2] discuss adding types with 64-bit offsets
> to Lists, Strings and binary data types.
>
> Philipp started an implementation for the large list type [3] and I hacked
> together a potentially viable java implementation [4]
>
> I'd like to kickoff the discussion for getting these types voted on. I'm
> coupling them together because I think there are design consideration for
> how we evolve Schema.fbs
>
> There are two proposed options:
> 1. The current PR proposal which adds a new type LargeList:
> // List with 64-bit offsets
> table LargeList {}
>
> 2. As François suggested, it might cleaner to parameterize List with
> offset width. I suppose something like:
>
> table List {
> // only 32 bit and 64 bit is supported.
> bitWidth: int = 32;
> }
>
> I think Option 2 is cleaner and potentially better long-term, but I think
> it breaks forward compatibility of the existing arrow libraries. If we
> proceed with Option 2, I would advocate making the change to Schema.fbs all
> at once for all types (assuming we think that 64-bit offsets are desirable
> for all types) along with future compatibility checks to avoid multiple
> releases were future compatibility is broken (by broken I mean the
> inability to detect that an implementation is receiving data it can't
> read). What are peoples thoughts on this?
I think Option 1 is ok. Making List / String / Binary parameterizable
doesn't bring anything *concretely*, since the types will not be
physically interchangeable. The cost of breaking compatibility should
be offset by a compelling benefit, which doesn't seem to exist here.
Of course, implementations are free to refactor their internals to avoid
code duplication (for example the C++ ListBuilder and LargeListBuilder
classes could be instances of a BaseListBuilder<IndexType> generic type)...
Regards
Antoine.