In case anyone else is interested, the relevant parts of parquet.thrift I think are [1] and [2].
I agree with Gang's interpretation that `num_nulls` is required vs `null_count) in Statistics is optional. Since Statistics is used in other places (e.g. ColumnMetadata[3]) I don't think we could make the null_count required there (not that you were proposing this) A bit off topic, but I think including Statistics in general in page headers is of limited use as to read them you need to have already fetched the page (and thus the amount of work that can be skipped is often pretty low by the time you have the page header). A better way is to include the statistics in the ColumnIndex[4] which can be fetched independently and then used to skip many pages at once Andrew [1]: https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/src/main/thrift/parquet.thrift#L724 [2]: https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/src/main/thrift/parquet.thrift#L291 [3]: https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/src/main/thrift/parquet.thrift#L912 [4]: https://github.com/apache/parquet-format/blob/9fd57b59e0ce1a82a69237dcf8977d3e72a2965d/src/main/thrift/parquet.thrift#L1163 On Thu, Oct 9, 2025 at 2:15 AM Gang Wu <[email protected]> wrote: > I think you're right. > > The only difference is that statistics is optional but the field in the > header is required. > > Best, > Gang > > On Wed, Oct 8, 2025 at 8:12 PM Antoine Pitrou <[email protected]> wrote: > > > > > Hello, > > > > It seems a V2 data page can have its number of nulls recorded in two > > adjacent locations: > > 1. the `num_nulls` field in `DataPageHeaderV2` > > 2. the `null_count` field in `DataPageHeaderV2.statistics` > > > > Is this interpretation right? Or do those two fields actually have > > different semantics. > > > > Regards > > > > Antoine. > > > > > > >
