ARROW-4810 [1] and ARROW-750 [2] discuss adding types with 64-bit offsets
to Lists, Strings and binary data types.
Philipp started an implementation for the large list type [3] and I hacked
together a potentially viable java implementation [4]
I'd like to kickoff the discussion for getting these types voted on. I'm
coupling them together because I think there are design consideration for
how we evolve Schema.fbs
There are two proposed options:
1. The current PR proposal which adds a new type LargeList:
// List with 64-bit offsets
table LargeList {}
2. As François suggested, it might cleaner to parameterize List with
offset width. I suppose something like:
table List {
// only 32 bit and 64 bit is supported.
bitWidth: int = 32;
}
I think Option 2 is cleaner and potentially better long-term, but I think
it breaks forward compatibility of the existing arrow libraries. If we
proceed with Option 2, I would advocate making the change to Schema.fbs all
at once for all types (assuming we think that 64-bit offsets are desirable
for all types) along with future compatibility checks to avoid multiple
releases were future compatibility is broken (by broken I mean the
inability to detect that an implementation is receiving data it can't
read). What are peoples thoughts on this?
Also, any other concern with adding these types?
Thanks,
Micah
[1] https://issues.apache.org/jira/browse/ARROW-4810
[2] https://issues.apache.org/jira/browse/ARROW-750
[3] https://github.com/apache/arrow/pull/3848
[4]
https://github.com/apache/arrow/commit/03956cac2202139e43404d7a994508080dc2cdd1