On Thu, 15 Aug 2019 11:17:07 -0700 Micah Kornfield <emkornfi...@gmail.com> wrote: > > > > In C++ they are > > independent, we could have 32-bit array lengths and variable-length > > types with 64-bit offsets if we wanted (we just wouldn't be able to > > have a List child with more than INT32_MAX elements). > > I think the point is we could do this in C++ but we don't. I'm not sure we > would have introduced the "Large" types if we did.
64-bit offsets take twice as much space as 32-bit offsets, so if you're storing lots of small-ish lists or strings, 32-bit offsets are preferrable. So even with 64-bit array lengths from the start it would still be beneficial to have types with 32-bit offsets. > Going with the limited address space in Java and calling it a reference > implementation seems suboptimal. If a consumer uses a "Large" type > presumably it is because they need the ability to store more than INT32_MAX > child elements in a column, otherwise it is just wasting space [1]. Probably. Though if the individual elements (lists or strings) are large, not much space is wasted in proportion, so it may be simpler in such a case to always create a "Large" type array. > [1] I suppose theoretically there might be some performance benefits on > 64-bit architectures to using the native word sizes. Concretely, common 64-bit architectures don't do that, as 32-bit is an extremely common integer size even in high-performance code. Regards Antoine.