Hi Alkis

Thanks for all your work on this proposal.

I'd be in favour of keeping the offsets as i64 and not reducing the maximum
row group size, even if this results in slightly larger footers. I've heard
from some of our users within G-Research that they do have files with row
groups > 2 GiB. This is often when they use lower-level APIs to write
Parquet that don't automatically split data into row groups, and they
either write a single row group for simplicity or have some logical
partitioning of data into row groups. They might also have wide tables with
many columns, or wide array/tensor valued columns that lead to large row
groups.

In many workflows we don't read Parquet with a query engine that supports
filters and skipping row groups, but just read all rows, or directly
specify the row groups to read if there is some known logical partitioning
into row groups. I'm sure we could work around a 2 or 4 GiB row group size
limitation if we had to, but it's a new constraint that reduces the
flexibility of the format and makes more work for users who now need to
ensure they don't hit this limit.

Do you have any measurements of how much of a difference 4 byte offsets
make to footer sizes in your data, with and without the optional LZ4
compression?

Thanks,
Adam

On Tue, 14 Oct 2025 at 21:02, Alkis Evlogimenos
<[email protected]> wrote:

> Hi all,
>
> From the comments on the [EXTERNAL] Parquet metadata
> <
> https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.0
> >
> document,
> it appears there's a general consensus on most aspects, with the exception
> of the relative 32-bit offsets for column chunks.
>
> I'm starting this thread to discuss this topic further and work towards a
> resolution. Adam Reeve suggested raising the limitation to 2^32, and he
> confirmed that Java does not have any issues with this. I am open to this
> change as it increases the limit without introducing any drawbacks.
>
> However, some still feel that a 2^32-byte limit for a row group is too
> restrictive. I'd like to understand these specific use cases better. From
> my perspective, for most engines, the row group is the primary unit of
> skipping, making very large row groups less desirable. In our fleet's
> workloads, it's rare to see row groups larger than 100MB, as anything
> larger tends to make statistics-based skipping ineffective.
>
> Cheers,
>

Reply via email to