In a recent mailing list discussion [1] Micah Kornfield has proposed
to add new list and variable-size binary and unicode types to the
Arrow columnar format with 64-bit signed integer offsets, to be used
in addition to the existing 32-bit offset varieties. These will be
implemented as new types in the Type union in Schema.fbs (the
particular names can be debated in the PR that implements them):

LargeList
LargeBinary
LargeString [UTF8]

While very large contiguous columns are not a principle use case for
the columnar format, it has been observed empirically that there are
applications that use the format to represent datasets where
realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
column and cannot be easily (or at all) split into smaller chunks.

Please vote whether to accept the changes. The vote will be open for at
least 72 hours.

[ ] +1 Accept the additions to the columnar format
[ ] +0
[ ] -1 Do not accept the changes because...

Thanks,
Wes

[1]: 
https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E

Reply via email to